Bug #2993
closedSuricata 5.0.0beta1 memory allocation of 4294966034 bytes failed
Description
Today suricata stopped after running for almost 9h:
May 27 12:05:22 server suricata28762: memory allocation of 4294966034 bytes failed
May 27 12:05:25 server systemd1: suricata.service: main process exited, code=killed, status=6/ABRT
May 27 12:05:25 server systemd1: Unit suricata.service entered failed state.
May 27 12:05:25 server systemd1: suricata.service failed.
Could not find a core file though I configured a home directory and core dump in suricata.yaml
As far as I can see the system was not running out of any resources.
- suricata -V
This is Suricata version 5.0.0-beta1 RELEASE
- ldd `which suricata`
linux-vdso.so.1 => (0x00007ffd769c7000)
libhtp.so.2 => /lib/libhtp.so.2 (0x00007fd563c32000)
librt.so.1 => /lib64/librt.so.1 (0x00007fd563a2a000)
libm.so.6 => /lib64/libm.so.6 (0x00007fd563728000)
liblz4.so.1 => /lib64/liblz4.so.1 (0x00007fd563513000)
libGeoIP.so.1 => /lib64/libGeoIP.so.1 (0x00007fd5632e3000)
libmagic.so.1 => /lib64/libmagic.so.1 (0x00007fd5630c6000)
libcap-ng.so.0 => /lib64/libcap-ng.so.0 (0x00007fd562ec0000)
libpfring.so.1 => /usr/local/lib/libpfring.so.1 (0x00007fd562c28000)
libpcap.so.1 => /usr/local/lib/libpcap.so.1 (0x00007fd562955000)
libnet.so.1 => /lib64/libnet.so.1 (0x00007fd56273b000)
libjansson.so.4 => /lib64/libjansson.so.4 (0x00007fd56252e000)
libyaml-0.so.2 => /lib64/libyaml-0.so.2 (0x00007fd56230e000)
libhs.so.5 => /usr/local/lib64/libhs.so.5 (0x00007fd5619f5000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fd561793000)
libz.so.1 => /lib64/libz.so.1 (0x00007fd56157d000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fd561357000)
libssl3.so => /lib64/libssl3.so (0x00007fd561105000)
libsmime3.so => /lib64/libsmime3.so (0x00007fd560ede000)
libnss3.so => /lib64/libnss3.so (0x00007fd560bb1000)
libnssutil3.so => /lib64/libnssutil3.so (0x00007fd560981000)
libplds4.so => /lib64/libplds4.so (0x00007fd56077d000)
libplc4.so => /lib64/libplc4.so (0x00007fd560578000)
libnspr4.so => /lib64/libnspr4.so (0x00007fd56033a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd56011e000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fd55ff1a000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd55fd04000)
libc.so.6 => /lib64/libc.so.6 (0x00007fd55f937000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd563e51000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd55f630000)
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.6 (Maipo)
Release: 7.6
Codename: Maipo
Kernel 3.10.0-957.12.1.el7.x86_64
pfring : 7.5.0 dev
Files
Updated by Peter Manev over 5 years ago
Hi,
Thank you for the report!
Was this a default config run or there were changes made to the memcap/ring settings ?
Updated by Anonymous over 5 years ago
- cat /etc/modprobe.d/pf_ring.conf
options pf_ring transparent_mode=0 min_num_slots=32767 enable_tx_capture=0
Updated by Andreas Herz over 5 years ago
- Assignee set to OISF Dev
- Target version set to Support
Can you share some more details? Like suricata --build-info, the configuration file and it would be perfect if you can reproduce it with a coredump.
At https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Debugging you can find more information how to debug this.
Updated by Anonymous over 5 years ago
- File suricata.yaml suricata.yaml added
Config file attached, maybe the settings are not correct to create a core dump because I was not able to find one after the crash yesterday?
# suricata --build-info This is Suricata version 5.0.0-beta1 RELEASE Features: PCAP_SET_BUFF PF_RING AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG LIBNET1.1 HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HAVE_LIBJANSSON TLS MAGIC RUST SIMD support: SSE_4_2 SSE_4_1 SSE_3 Atomic intrisics: 1 2 4 8 16 byte(s) 64-bits, Little-endian architecture GCC version 4.8.5 20150623 (Red Hat 4.8.5-36), C version 199901 compiled with -fstack-protector compiled with _FORTIFY_SOURCE=2 L1 cache line size (CLS)=64 thread local storage method: __thread compiled with LibHTP v0.5.30, linked against LibHTP v0.5.30 Suricata Configuration: AF_PACKET support: yes eBPF support: no XDP support: no PF_RING support: yes NFQueue support: no NFLOG support: no IPFW support: no Netmap support: no DAG enabled: no Napatech enabled: no WinDivert enabled: no Unix socket enabled: yes Detection enabled: yes Libmagic support: yes libnss support: yes libnspr support: yes libjansson support: yes liblzma support: yes hiredis support: no hiredis async with libevent: no Prelude support: no PCRE jit: yes LUA support: no libluajit: no libgeoip: yes Non-bundled htp: no Old barnyard2 support: no Hyperscan support: yes Libnet support: yes liblz4 support: yes Rust support: yes Rust strict mode: no Rust debug mode: no Rust compiler: rustc 1.34.0 Rust cargo: cargo 1.34.0 Python support: yes Python path: /bin/python2.7 Python version: Python distutils yes Python yaml yes Install suricatactl: yes Install suricatasc: yes Install suricata-update: yes Profiling enabled: no Profiling locks enabled: no Development settings: Coccinelle / spatch: no Unit tests enabled: no Debug output enabled: no Debug validation enabled: no Generic build parameters: Installation prefix: /usr Configuration directory: /etc/suricata/ Log directory: /var/log/suricata/ --prefix /usr --sysconfdir /etc --localstatedir /var --datarootdir /usr/share Host: x86_64-pc-linux-gnu Compiler: gcc (exec name) / gcc (real) GCC Protect enabled: yes GCC march native enabled: yes GCC Profile enabled: no Position Independent Executable enabled: no CFLAGS -g -O2 -march=native -I${srcdir}/../rust/gen/c-headers PCAP_CFLAGS -I/usr/local/include SECCFLAGS -fstack-protector -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security
Updated by Peter Manev over 5 years ago
How much memory you have on the box btw in total? (and is it only suricata running or something else that can be resource intensive ?)
Updated by Anonymous over 5 years ago
64Gb RAM available and indeed an ELK stack consumes almost half of the available RAM, but this setup has been running for years quite stable. On the other we want/need to move the Elasticsearch setup to a newly to build cluster.
Updated by Peter Manev over 5 years ago
I think it is a good candidate reason in some conditions (aka an intense ES search) for not being able to have that needed extra allocation. In general yes - could be better if things are separated as i suspect resource contention would also occur(if not yet) in terms of CPU too.
Updated by Victor Julien over 5 years ago
I don't think this has anything to do with the system or what else is running on it. The error suggests suri tried to alloc almost 4GB in a single malloc, which is certainly a bug. It smells like an int underflow.
Updated by Anonymous over 5 years ago
Indeed, but I had a look at some system resources when the crash occured, but neither Zabbix nor sar did mention lack of memory or cpu. Oke is is a busy system, but that keeps the dust away ;)
Updated by Anonymous over 5 years ago
Running v5 since May second, it happend twice so far:
2019-05-27T12:05:22.331122+02:00 server suricata: memory allocation of 4294966034 bytes failed
2019-05-21T09:44:12.279565+02:00 server suricata: memory allocation of 4294926257 bytes failed
Suricata is restarted daily to rotate logs.
Updated by Peter Manev over 5 years ago
Is it possible to correlate the sys/ram utilization at that time ? Just to make sure it is not sys/mem related I was thinking.
Updated by Anonymous over 5 years ago
May 27 crash at 12:05 :
CPU %user %nice %system %iowait %steal %idle 11:30:02 AM all 38.47 5.57 12.01 3.43 0.00 40.53 11:40:02 AM all 37.36 5.61 11.57 4.06 0.00 41.40 11:50:01 AM all 38.84 5.73 11.63 4.01 0.00 39.79 12:00:02 PM all 38.99 5.63 12.03 3.78 0.00 39.57 12:10:01 PM all 17.96 2.53 5.88 2.80 0.00 70.82 12:20:01 PM all 1.27 0.08 0.46 0.17 0.00 98.02 12:30:01 PM all 1.06 0.07 0.43 0.05 0.00 98.39 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 11:30:02 AM 429940 65502432 99.35 2784 2184880 42437912 57.10 27393028 3474244 87552 11:40:02 AM 434328 65498044 99.34 2732 2262996 42416296 57.07 27314036 3541212 26828 11:50:01 AM 533528 65398844 99.19 2568 1451616 42286528 56.90 27520568 3257140 27148 12:00:02 PM 598556 65333816 99.09 2700 1085900 42580564 57.29 27543920 3160296 78272 12:10:01 PM 26531992 39400380 59.76 6572 4556032 35589444 47.89 2030348 4324928 2220 12:20:01 PM 25655172 40277200 61.09 23952 5361284 35609344 47.91 2549072 4669684 3396 12:30:01 PM 25304540 40627832 61.62 24840 5701996 35586908 47.88 2782596 4777068 2944 kbswpfree kbswpused %swpused kbswpcad %swpcad 11:30:02 AM 2970544 5418060 64.59 232280 4.29 11:40:02 AM 2714704 5673900 67.64 229024 4.04 11:50:01 AM 2390440 5998164 71.50 249900 4.17 12:00:02 PM 2020524 6368080 75.91 220356 3.46 12:10:01 PM 3570620 4817984 57.43 91680 1.90 12:20:01 PM 3576252 4812352 57.37 121428 2.52 12:30:01 PM 3579580 4809024 57.33 124324 2.59
May 21 crash at 09:44 :
CPU %user %nice %system %iowait %steal %idle 09:00:01 AM all 33.32 7.42 9.71 2.61 0.00 46.94 09:10:01 AM all 32.51 6.73 10.10 2.17 0.00 48.49 09:20:02 AM all 32.09 6.75 10.27 2.50 0.00 48.40 09:30:02 AM all 32.20 6.41 9.55 2.37 0.00 49.46 09:40:02 AM all 34.08 6.48 10.17 4.98 0.00 44.29 09:50:01 AM all 16.93 2.75 5.31 2.47 0.00 72.54 10:00:02 AM all 0.94 0.08 0.45 0.05 0.00 98.49 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 09:00:01 AM 684396 65247976 98.96 6620 12489868 39250032 52.81 21870676 8354172 313188 09:10:01 AM 768752 65163620 98.83 6884 10145636 40047656 53.88 22893100 7287660 417328 09:20:02 AM 560148 65372224 99.15 6848 7709440 40094496 53.95 24282280 6187248 390432 09:30:02 AM 669136 65263236 98.99 5820 6772564 39834916 53.60 24672960 5765980 261164 09:40:02 AM 436924 65495448 99.34 2392 774276 40230372 54.13 27868580 3037652 32116 09:50:01 AM 29049272 36883100 55.94 25652 2155684 35554456 47.84 2226176 1659824 1304 10:00:02 AM 28781336 37151036 56.35 26612 2405776 35554344 47.84 2534504 1611564 96
Updated by Anonymous over 5 years ago
kbswpfree kbswpused %swpused kbswpcad %swpcad 09:00:01 AM 3479724 4908880 58.52 301672 6.15 09:10:01 AM 3473240 4915364 58.60 300224 6.11 09:20:02 AM 3468888 4919716 58.65 278480 5.66 09:30:02 AM 3457368 4931236 58.78 240940 4.89 09:40:02 AM 2628548 5760056 68.67 122640 2.13 09:50:01 AM 3331300 5057304 60.29 125436 2.48 10:00:02 AM 3338468 5050136 60.20 128456 2.54
Updated by Andreas Herz over 5 years ago
Might be worth to test AF_PACKET instead of PF_RING and a more recent kernel maybe?
Updated by Victor Julien about 5 years ago
I think it would be good to set a maximum to what a single allocation can try to get. Determining what that max would be is of course the trick.
Updated by Victor Julien about 5 years ago
Possibly related:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007fb803076801 in __GI_abort () at abort.c:79 #2 0x00005619886e5367 in std::sys::unix::abort_internal () #3 0x00005619886f39fd in rust_oom () #4 0x0000561988708b67 in alloc::alloc::handle_alloc_error () #5 0x000056198860a181 in alloc::raw_vec::RawVec<T,A>::allocate_in (cap=4294965836, zeroed=<optimized out>, a=...) at /usr/src/rustc-1.37.0/src/liballoc/raw_vec.rs:99 #6 alloc::raw_vec::RawVec<T>::with_capacity_zeroed (cap=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/raw_vec.rs:145 #7 <u8 as alloc::vec::SpecFromElem>::from_elem (elem=0, n=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/vec.rs:1635 #8 alloc::vec::from_elem (elem=0, n=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/vec.rs:1614 #9 suricata::smb::smb::SMBState::parse_tcp_data_tc_gap (self=0x7fb7f47b6630, gap_size=<optimized out>) at src/smb/smb.rs:1734 #10 rs_smb_parse_response_tcp_gap (state=state@entry=0x7fb7f47b6630, input_len=<optimized out>, input_len@entry=4294965836) at src/smb/smb.rs:1864 #11 0x00005619883f766e in SMBTCPParseResponse (f=0x56198a88b380, state=0x7fb7f47b6630, pstate=0x7fb7f4611760, input=0x0, input_len=4294965836, local_data=0x0, flags=24 '\030') at app-layer-smb.c:68 #12 0x00005619883f6e06 in AppLayerParserParse (tv=tv@entry=0x56199f507920, alp_tctx=0x7fb7f429eeb0, f=f@entry=0x56198a88b380, alproto=8, flags=<optimized out>, input=input@entry=0x0, input_len=input_len@entry=4294965836) at app-layer-parser.c:1225 #13 0x00005619883b1cd6 in AppLayerHandleTCPData (tv=tv@entry=0x56199f507920, ra_ctx=ra_ctx@entry=0x7fb7f429ea90, p=p@entry=0x7fb7f4268180, f=0x56198a88b380, ssn=ssn@entry=0x7fb7f4359620, stream=stream@entry=0x7fb7fd3ae748, data=<optimized out>, data@entry=0x0, data_len=<optimized out>, flags=<optimized out>) at app-layer.c:601 #14 0x00005619885483d0 in ReassembleUpdateAppLayer (dir=<optimized out>, p=<optimized out>, stream=<optimized out>, ssn=<optimized out>, ra_ctx=<optimized out>, tv=<optimized out>) at stream-tcp-reassemble.c:1012 #15 StreamTcpReassembleAppLayer (tv=tv@entry=0x56199f507920, ra_ctx=ra_ctx@entry=0x7fb7f429ea90, ssn=ssn@entry=0x7fb7f4359620, stream=<optimized out>, stream@entry=0x7fb7f4359630, p=p@entry=0x7fb7f4268180, dir=dir@entry=UPDATE_DIR_PACKET) at stream-tcp-reassemble.c:1123 #16 0x000056198854905c in StreamTcpReassembleHandleSegment (tv=tv@entry=0x56199f507920, ra_ctx=0x7fb7f429ea90, ssn=ssn@entry=0x7fb7f4359620, stream=0x7fb7f4359630, p=p@entry=0x7fb7f4268180, pq=pq@entry=0x7fb7f429e738) at stream-tcp-reassemble.c:1788 #17 0x000056198853e33f in HandleEstablishedPacketToClient (stt=<optimized out>, pq=<optimized out>, p=<optimized out>, ssn=<optimized out>, tv=<optimized out>) at stream-tcp.c:2408 #18 StreamTcpPacketStateEstablished (tv=0x56199f507920, p=0x7fb7f4268180, stt=stt@entry=0x7fb7f429e730, ssn=0x7fb7f4359620, pq=0x7fb7f429e738) at stream-tcp.c:2645 #19 0x0000561988541870 in StreamTcpStateDispatch (tv=0x56199f507920, p=0x7fb7f4268180, stt=0x7fb7f429e730, ssn=0x7fb7f4359620, pq=0x7fb7f429e738, state=<optimized out>) at stream-tcp.c:4650 #20 0x0000561988543bd9 in StreamTcpPacket (tv=0x56199f507920, p=0x7fb7f4268180, stt=0x7fb7f429e730, pq=0x7fb7f428d5f8) at stream-tcp.c:4838 #21 0x0000561988544505 in StreamTcp (tv=tv@entry=0x56199f507920, p=p@entry=0x7fb7f4268180, data=<optimized out>, pq=pq@entry=0x7fb7f428d5f8, postpq=postpq@entry=0x0) at stream-tcp.c:5174 #22 0x00005619884cc4b2 in FlowWorker (tv=0x56199f507920, p=0x7fb7f4268180, data=0x7fb7f428d5d0, preq=0x56199f507d00, unused=<optimized out>) at flow-worker.c:245 #23 0x000056198855308c in TmThreadsSlotVarRun (tv=tv@entry=0x56199f507920, p=p@entry=0x7fb7f4268180, slot=slot@entry=0x56199f507b80) at tm-threads.c:130 #24 0x000056198852ae4e in TmThreadsSlotProcessPkt (p=0x7fb7f4268180, s=0x56199f507b80, tv=0x56199f507920) at tm-threads.h:162 #25 AFPReadFromRing (ptv=<optimized out>) at source-af-packet.c:1025 #26 0x000056198852ecd6 in ReceiveAFPLoop (tv=0x56199f507920, data=<optimized out>, slot=<optimized out>) at source-af-packet.c:1589 #27 0x000056198855430d in TmThreadsSlotPktAcqLoop (td=0x56199f507920) at tm-threads.c:335 #28 0x00007fb803cf36db in start_thread (arg=0x7fb7fd3b0700) at pthread_create.c:463 #29 0x00007fb80315788f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Updated by Victor Julien almost 5 years ago
- Status changed from New to Assigned
- Assignee changed from OISF Dev to Victor Julien
- Target version changed from Support to 5.0.2
- Affected Versions 5.0.0, 5.0.1 added
Updated by Victor Julien almost 5 years ago
- Status changed from Assigned to Closed