Project

General

Profile

Actions

Bug #4219

closed

Suricata 6.0.1 segfault

Added by Anonymous almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Description

Hello,
Last night round 1h CEST so not much of a load, suricata crashed, not able to find a core dump (is configured in suricata.yaml)

Dec 14 01:06:57 scomp1185 kernel: W#01[47474]: segfault at 5a4 ip 00007fda8d6c2c40 sp 00007fd9d4efb568 error 4 in libc-2.17.so[7fda8d576000+1c4000]
Dec 14 01:07:00 scomp1185 kernel: device ens2f0 left promiscuous mode
Dec 14 01:07:01 scomp1185 kernel: device ens2f1 left promiscuous mode
Dec 14 01:07:01 scomp1185 systemd[1]: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 14 01:07:01 scomp1185 systemd[1]: Unit suricata.service entered failed state.
Dec 14 01:07:01 scomp1185 systemd: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 14 01:07:01 scomp1185 systemd: Unit suricata.service entered failed state.
Dec 14 01:07:01 scomp1185 systemd[1]: suricata.service failed.
Dec 14 01:07:01 scomp1185 systemd: suricata.service failed.

LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID:    RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 7.9 (Maipo)
Release:    7.9
Codename:    Maipo

Kernel:  3.10.0-1160.2.2.el7.x86_64 #1 SMP Sat Oct 17 05:06:47 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
SELinux not enabled

This is Suricata version 6.0.1 RELEASE

# cat /proc/net/pf_ring/info 
PF_RING Version          : 7.8.0 (unknown)
Total rings              : 28
Standard (non ZC) Options
Ring slots               : 32767
Slot version             : 18
Capture TX               : No [RX only]
IP Defragment            : No
Socket Mode              : Standard
Cluster Fragment Queue   : 201
Cluster Fragment Discard : 0

Related issues 1 (0 open1 closed)

Related to Task #4257: libhtp 0.5.37ClosedVictor JulienActions
Actions #1

Updated by Victor Julien almost 2 years ago

  • Description updated (diff)
Actions #2

Updated by Victor Julien almost 2 years ago

Think it is going to be hard to determine the cause of this. A segfault in libc is probably related to some kind of memory corruption or double free or something, but w/o more details it will be impossible to say where. Are you able to run it in the foreground in gdb or fix your core dumping setup? Then the next crash might give us more clues.

Actions #3

Updated by Anonymous almost 2 years ago

Hi Victor, I know, any adviced settings regarding coredump? Bit busy here seen a.o. Solarwinds ;)

Actions #4

Updated by Anonymous almost 2 years ago

Crashed again today:
Dec 15 14:43:54 scomp1185 kernel: W#2922400: segfault at 3eac ip 00007fae97d08c40 sp 00007fad41ffa5c8 error 4 in libc-2.17.so[7fae97bbc000+1c4000]
Dec 15 14:43:56 scomp1185 kernel: device ens2f0 left promiscuous mode
Dec 15 14:43:58 scomp1185 kernel: device ens2f1 left promiscuous mode
Dec 15 14:43:58 scomp1185 systemd1: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 15 14:43:58 scomp1185 systemd1: Unit suricata.service entered failed state.
Dec 15 14:43:58 scomp1185 systemd: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 15 14:43:58 scomp1185 systemd1: suricata.service failed.
Dec 15 14:43:58 scomp1185 systemd: Unit suricata.service entered failed state.
Dec 15 14:43:58 scomp1185 systemd: suricata.service failed.

suricata.yaml:

coredump:
max-dump: 2g
#max-dump: unlimited

run-as:
user: suri
group: suri
daemon-directory: "/home/suri/"

Runs a non priviled user, but no core in it's home directory or elsewhere to be found?!

Actions #5

Updated by Anonymous almost 2 years ago

And again last night, please advise on correcting the coredump settings! Running via gdb is a bit tricky seen log rotating and rules update scripts.
Otherwise I've to switch back to a stable 5.0.5 because this is production now failing.

Dec 16 04:19:59 scomp1185 kernel: W#2443752: segfault at 0 ip 00007f26a05d37b0 sp 00007f25657f9568 error 4 in libc-2.17.so[7f26a0485000+1c4000]
Dec 16 04:20:00 scomp1185 kernel: device ens2f0 left promiscuous mode
Dec 16 04:20:02 scomp1185 kernel: device ens2f1 left promiscuous mode
Dec 16 04:20:02 scomp1185 systemd1: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 16 04:20:02 scomp1185 systemd1: Unit suricata.service entered failed state.
Dec 16 04:20:02 scomp1185 systemd: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 16 04:20:02 scomp1185 systemd: Unit suricata.service entered failed state.
Dec 16 04:20:02 scomp1185 systemd1: suricata.service failed.
Dec 16 04:20:02 scomp1185 systemd: suricata.service failed.

Actions #6

Updated by Anonymous almost 2 years ago

Getting better, also crashed a moment ago on an other server:
W#1731108: segfault at 0 ip 00007fad4b47dbe5 sp 00007fac5bffe598 error 4 in libc-2.17.so[7fad4b331000+1c4000]

Switching back to 5.0.5

Actions #7

Updated by Peter Manev almost 2 years ago

Do you start it as a service or on the command line?
The core might be in "/" (root) folder maybe? Did you compile it with debugging enabled to get us a bit more clues ? (for ref "./configure YOUR_USUAL_FLAGS CFLAGS="-ggdb -O0" " )

Actions #8

Updated by Anonymous almost 2 years ago

As a service via systemd systemctl indeed.
No I'm running suricata under a normal account, non root.
No debugging options configured, I'll build 6.0.1 again if you prefer:
./configure --enable-gccprotect --prefix=/usr/ --sysconfdir=/etc/ --localstatedir=/var/ --enable-pfring LD_RUN_PATH=/opt/PF_RING/lib:/usr/lib:/usr/local/lib --with-libpfring-libraries=/usr/local/lib/ --with-libpfring-includes=/opt/PF_RING/include/ --with-libpcap-libraries=/usr/local/lib/ --with-libpcap-includes=/usr/local/include/ --enable-geoip -with-libhs-includes=/usr/local/include/hs/ --with-libhs-libraries=/usr/local/lib64/ --enable-rust

What to add to improve you're debugging options?
TIA!

Actions #9

Updated by Victor Julien almost 2 years ago

  • Status changed from New to Assigned
  • Assignee set to Philippe Antoine
  • Target version set to 6.0.2

I've helped Andre to get a bt and analyze it:

#0  0x00007f7bbfa5dc40 in __memcpy_ssse3 () from /lib64/libc.so.6
#1  0x00000000004e2dc0 in HTPCallbackResponseHeaderData (tx_data=0x7f7b28efb2d0) at app-layer-htp.c:2337
#2  0x00007f7bc3a1116a in htp_hook_run_all (hook=0x12e6350, user_data=user_data@entry=0x7f7b28efb2d0) at htp_hooks.c:127
#3  0x00007f7bc3a157d5 in htp_connp_res_receiver_send_data (connp=connp@entry=0x7f7b8d3c0e90, is_last=is_last@entry=0) at htp_response.c:102
#4  0x00007f7bc3a174b2 in htp_connp_res_data (connp=0x7f7b8d3c0e90, timestamp=<optimized out>, data=<optimized out>, len=<optimized out>) at htp_response.c:1342
#5  0x00000000004e00df in HTPHandleResponseData (f=0x7f7ac4cdea20, htp_state=0x7f7b8ef0a050, pstate=0x7f7b8e38f2f0, input=0x0, input_len=1460, local_data=0x0, flags=26 '\032') at app-layer-htp.c:953
#6  0x00000000004eb756 in AppLayerParserParse (tv=0x182c5850, alp_tctx=0x7f7b1c548340, f=0x7f7ac4cdea20, alproto=1, flags=26 '\032', input=0x0, input_len=1460) at app-layer-parser.c:1258
#7  0x00000000004bbc00 in AppLayerHandleTCPData (tv=0x182c5850, ra_ctx=0x7f7b1c535cc0, p=0x7f7b1c4deb80, f=0x7f7ac4cdea20, ssn=0x7f7b1c60e430, stream=0x7f7b28efb5e0, data=0x0, data_len=1460, flags=26 '\032') at app-layer.c:627
#8  0x0000000000616b12 in ReassembleUpdateAppLayer (tv=0x182c5850, ra_ctx=0x7f7b1c535cc0, ssn=0x7f7b1c60e430, stream=0x7f7b28efb5e0, p=0x7f7b1c4deb80, dir=UPDATE_DIR_PACKET) at stream-tcp-reassemble.c:1111
#9  0x0000000000616f7c in StreamTcpReassembleAppLayer (tv=0x182c5850, ra_ctx=0x7f7b1c535cc0, ssn=0x7f7b1c60e430, stream=0x7f7b1c60e440, p=0x7f7b1c4deb80, dir=UPDATE_DIR_PACKET) at stream-tcp-reassemble.c:1238
#10 0x000000000061824c in StreamTcpReassembleHandleSegment (tv=0x182c5850, ra_ctx=0x7f7b1c535cc0, ssn=0x7f7b1c60e430, stream=0x7f7b1c60e440, p=0x7f7b1c4deb80, pq=0x7f7b1c50c7a0) at stream-tcp-reassemble.c:1900
#11 0x000000000060fcae in StreamTcpPacket (tv=0x182c5850, p=0x7f7b1c4deb80, stt=0x7f7b1c5359b0, pq=0x7f7b1c50c7a0) at stream-tcp.c:4824
#12 0x000000000061075f in StreamTcp (tv=0x182c5850, p=0x7f7b1c4deb80, data=0x7f7b1c5359b0, pq=0x7f7b1c50c7a0) at stream-tcp.c:5194
#13 0x00000000005abfab in FlowWorkerStreamTCPUpdate (tv=0x182c5850, fw=0x7f7b1c50c770, p=0x7f7b1c4deb80, detect_thread=0x7f7b1c61ed10) at flow-worker.c:364
#14 0x00000000005ac0c6 in FlowWorkerFlowTimeout (tv=0x182c5850, p=0x7f7b1c4deb80, fw=0x7f7b1c50c770, detect_thread=0x7f7b1c61ed10) at flow-worker.c:401
#15 0x00000000005ab7c9 in FlowFinish (tv=0x182c5850, f=0x7f7ac4cdea20, fw=0x7f7b1c50c770, detect_thread=0x7f7b1c61ed10) at flow-worker.c:157
#16 0x00000000005ab88c in CheckWorkQueue (tv=0x182c5850, fw=0x7f7b1c50c770, detect_thread=0x7f7b1c61ed10, counters=0x7f7b28efb850, fq=0x7f7b28efb860) at flow-worker.c:183
#17 0x00000000005ac25c in FlowWorkerProcessInjectedFlows (tv=0x182c5850, fw=0x7f7b1c50c770, p=0x7f7b1c4dff00, detect_thread=0x7f7b1c61ed10) at flow-worker.c:447
#18 0x00000000005ac5d6 in FlowWorker (tv=0x182c5850, p=0x7f7b1c4dff00, data=0x7f7b1c50c770) at flow-worker.c:570
#19 0x0000000000621fd6 in TmThreadsSlotVarRun (tv=0x182c5850, p=0x7f7b1c4dff00, slot=0x1a576e70) at tm-threads.c:117
#20 0x00000000005fff61 in TmThreadsSlotProcessPkt (tv=0x182c5850, s=0x1a576e70, p=0x7f7b1c4dff00) at tm-threads.h:192
#21 0x00000000006007c3 in ReceivePfringLoop (tv=0x182c5850, data=0x7f7b1c4e08c0, slot=0x1c78a9a0) at source-pfring.c:415
#22 0x00000000006226ba in TmThreadsSlotPktAcqLoop (td=0x182c5850) at tm-threads.c:312
#23 0x00007f7bc0100ea5 in start_thread () from /lib64/libpthread.so.0
#24 0x00007f7bbfa0f96d in clone () from /lib64/libc.so.6

(gdb) f 1
#1  0x00000000004e2dc0 in HTPCallbackResponseHeaderData (tx_data=0x7f7b28efb2d0) at app-layer-htp.c:2337
2337        memcpy(tx_ud->response_headers_raw + tx_ud->response_headers_raw_len,

(gdb) p *tx_data
$4 = {tx = 0x7f79b02b5af0, data = 0x0, len = 1460, is_last = 0}

#3  0x00007f7bc3a157d5 in htp_connp_res_receiver_send_data (connp=connp@entry=0x7f7b8d3c0e90, is_last=is_last@entry=0) at htp_response.c:102
102         htp_status_t rc = htp_hook_run_all(connp->out_data_receiver_hook, &d);
(gdb) p d
$5 = {tx = 0x7f79b02b5af0, data = 0x0, len = 1460, is_last = 0}
(gdb) l
97          d.tx = connp->out_tx;
98          d.data = connp->out_current_data + connp->out_current_receiver_offset;
99          d.len = connp->out_current_read_offset - connp->out_current_receiver_offset;
100         d.is_last = is_last;
101
102         htp_status_t rc = htp_hook_run_all(connp->out_data_receiver_hook, &d);
103         if (rc != HTP_OK) return rc;
104
105         connp->out_current_receiver_offset = connp->out_current_read_offset;
106
(gdb) p connp->out_current_data 
$6 = (unsigned char *) 0x0
(gdb) p connp->out_current_receiver_offset 
$7 = 0
(gdb) p connp->out_current_read_offset 
$8 = 1460

#4  0x00007f7bc3a174b2 in htp_connp_res_data (connp=0x7f7b8d3c0e90, timestamp=<optimized out>, data=<optimized out>, len=<optimized out>) at htp_response.c:1342
1342                    htp_connp_res_receiver_send_data(connp, 0 /* not last */);

#5  0x00000000004e00df in HTPHandleResponseData (f=0x7f7ac4cdea20, htp_state=0x7f7b8ef0a050, pstate=0x7f7b8e38f2f0, input=0x0, input_len=1460, local_data=0x0, flags=26 '\032') at app-layer-htp.c:953
953             const int r = htp_connp_res_data(hstate->connp, &ts, input, input_len);
(gdb) l
948
949         htp_time_t ts = { f->lastts.tv_sec, f->lastts.tv_usec };
950         htp_tx_t *tx = NULL;
951         size_t consumed = 0;
952         if (input_len > 0) {
953             const int r = htp_connp_res_data(hstate->connp, &ts, input, input_len);
954             switch (r) {
955                 case HTP_STREAM_ERROR:
956                     ret = -1;
957                     break;
(gdb) p input
$9 = (const uint8_t *) 0x0
(gdb) p input_len 
$10 = 1460
(gdb) p/x flags
$11 = 0x1a

First look suggests an issue between suricata and libhtp on timeout handling, where suricata gets a NULL pointer it doesn't expect.

Unclear if that is the issue or if it is a symptom of a deeper issue.

Actions #10

Updated by Philippe Antoine over 1 year ago

Can we print connp->out_state and rc in frame 4 htp_connp_res_data ? and htp_connp_RES_BODY_IDENTITY_CL_KNOWN and htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE and htp_connp_RES_FINALIZE ?

Actions #11

Updated by Victor Julien over 1 year ago

Not sure I understand what you meant with "and htp_connp_RES_BODY_IDENTITY_CL_KNOWN and htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE and htp_connp_RES_FINALIZE ?" What do you need me to do there?

(gdb) f 4
#4  0x00007f7bc3a174b2 in htp_connp_res_data (connp=0x7f7b8d3c0e90, timestamp=<optimized out>, data=<optimized out>, len=<optimized out>) at htp_response.c:1342
1342                    htp_connp_res_receiver_send_data(connp, 0 /* not last */);
(gdb) p connp->out_state
$1 = (int (*)(htp_connp_t *)) 0x7f7bc3a15940 <htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE>
(gdb) p rc
$2 = <optimized out>
(gdb) l
1337            }
1338
1339            if (rc != HTP_OK) {
1340                // Do we need more data?
1341                if ((rc == HTP_DATA) || (rc == HTP_DATA_BUFFER)) {
1342                    htp_connp_res_receiver_send_data(connp, 0 /* not last */);
1343
1344                    if (rc == HTP_DATA_BUFFER) {
1345                        if (htp_connp_res_buffer(connp) != HTP_OK) {
1346                            connp->out_status = HTP_STREAM_ERROR;
(gdb) p htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE
$3 = {htp_status_t (htp_connp_t *)} 0x7f7bc3a15940 <htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE>
(gdb) p htp_connp_RES_FINALIZE
$4 = {htp_status_t (htp_connp_t *)} 0x7f7bc3a16830 <htp_connp_RES_FINALIZE>
Actions #12

Updated by Philippe Antoine over 1 year ago

Thanks, I got the information I need.
I think this highlights a deeper functional bug where we do not call htp_connp_res_receiver_finalize_clear in one scenario to be determined

Actions #13

Updated by Philippe Antoine over 1 year ago

  • Status changed from Assigned to In Review

Gitlab MR

Root bug is that out_data_receiver_hook does not get cleared by calling htp_connp_res_receiver_finalize_clear for every case, especially when htp_tx_state_response_complete_ex gets called while processing the headers
Thus, out_data_receiver_hook got reused for another transaction where it was not valid

Actions #15

Updated by Victor Julien over 1 year ago

Actions #16

Updated by Victor Julien over 1 year ago

  • Status changed from In Review to Closed

Fixed by https://github.com/OISF/libhtp/pull/314, thanks Philippe!

Actions #17

Updated by Philippe Antoine over 1 year ago

  • Status changed from Closed to In Review
Actions #20

Updated by Philippe Antoine over 1 year ago

  • Status changed from In Review to Closed
Actions

Also available in: Atom PDF