Project

General

Profile

Actions

Bug #4582

closed

BUG_ON triggered from TmThreadsInjectFlowById

Added by Jeff Lucovsky over 3 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Subtasks 1 (0 open1 closed)

Bug #5559: BUG_ON triggered from TmThreadsInjectFlowById (6.0.x backport)ClosedVictor JulienActions

Related issues 1 (0 open1 closed)

Related to Suricata - Bug #4468: Assertion failures in TmThreadsInjectFlowByIdClosedActions
Actions #1

Updated by Jeff Lucovsky over 3 years ago

SIGABRT at large customer with 40Gbps+

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7fee0f4ff640 (LWP 37729))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff0b30bf12e in __GI_abort () at abort.c:79
#2  0x00007ff0b30b680a in __assert_fail_base (fmt=0x7ff0b3210040 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x56501fb0ae88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x56501fb0aef6 "tm-threads.c", line=line@entry=2320,
    function=function@entry=0x56501fb0afd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:92
#3  0x00007ff0b30b6882 in __GI___assert_fail (assertion=assertion@entry=0x56501fb0ae88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x56501fb0aef6 "tm-threads.c", line=line@entry=2320,
    function=function@entry=0x56501fb0afd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:101
#4  0x000056501f830fa0 in TmThreadsInjectFlowById (f=f@entry=0x7fee6d734ac0, id=<optimized out>) at tm-threads.c:2320
#5  0x000056501f7d0469 in FlowForceReassemblyForFlow (f=f@entry=0x7fee6d734ac0) at flow-timeout.c:352
#6  0x000056501f7cd865 in ProcessAsideQueue (td=td@entry=0x7fee0e801030, counters=<optimized out>, counters=<optimized out>) at flow-manager.c:335
#7  0x000056501f7cdd4d in FlowTimeoutHash (td=td@entry=0x7fee0e801030, ts=ts@entry=0x7fee0f4fd310, hash_min=<optimized out>, hash_max=<optimized out>, counters=counters@entry=0x7fee0f4fd330) at flow-manager.c:524
#8  0x000056501f7ce69c in FlowTimeoutHashInChunks (chunks=40, iter=21, counters=0x7fee0f4fd330, hash_max=<optimized out>, hash_min=<optimized out>, ts=0x7fee0f4fd310, td=0x7fee0e801030) at flow-manager.c:554
#9  FlowManager (th_v=0x7fee93f16f80, thread_data=0x7fee0e801000) at flow-manager.c:899
#10 0x000056501f82ef54 in TmThreadsManagement (td=0x7fee93f16f80) at tm-threads.c:551
#11 0x00007ff0b4d2f37e in start_thread (arg=0x7fee0f4ff640) at pthread_create.c:463
#12 0x00007ff0b3180b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) fr 5
#5  0x000056501f7d0469 in FlowForceReassemblyForFlow (f=f@entry=0x7fee6d734ac0) at flow-timeout.c:352
352     in flow-timeout.c
(gdb) p f->thread_id
$5 = {0, 18}

Second occurrence -- similar -- note f->thread_id array

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f19298ff640 (LWP 21542))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f1bcd6f612e in __GI_abort () at abort.c:79
#2  0x00007f1bcd6ed80a in __assert_fail_base (fmt=0x7f1bcd847040 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55b42e378e88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x55b42e378ef6 "tm-threads.c", line=line@entry=2320,
    function=function@entry=0x55b42e378fd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:92
#3  0x00007f1bcd6ed882 in __GI___assert_fail (assertion=assertion@entry=0x55b42e378e88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x55b42e378ef6 "tm-threads.c", line=line@entry=2320,
    function=function@entry=0x55b42e378fd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:101
#4  0x000055b42e09efa0 in TmThreadsInjectFlowById (f=f@entry=0x7f1a8fe28300, id=<optimized out>) at tm-threads.c:2320
#5  0x000055b42e03e469 in FlowForceReassemblyForFlow (f=f@entry=0x7f1a8fe28300) at flow-timeout.c:352
#6  0x000055b42e03b865 in ProcessAsideQueue (td=td@entry=0x7f1928c01030, counters=<optimized out>, counters=<optimized out>) at flow-manager.c:335
#7  0x000055b42e03bd4d in FlowTimeoutHash (td=td@entry=0x7f1928c01030, ts=ts@entry=0x7f19298fd310, hash_min=<optimized out>, hash_max=<optimized out>, counters=counters@entry=0x7f19298fd330) at flow-manager.c:524
#8  0x000055b42e03c69c in FlowTimeoutHashInChunks (chunks=40, iter=30, counters=0x7f19298fd330, hash_max=<optimized out>, hash_min=<optimized out>, ts=0x7f19298fd310, td=0x7f1928c01030) at flow-manager.c:554
#9  FlowManager (th_v=0x7f19ae1fb480, thread_data=0x7f1928c01000) at flow-manager.c:899
#10 0x000055b42e09cf54 in TmThreadsManagement (td=0x7f19ae1fb480) at tm-threads.c:551
#11 0x00007f1bcf36637e in start_thread (arg=0x7f19298ff640) at pthread_create.c:463
#12 0x00007f1bcd7b7b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) fr 5
#5  0x000055b42e03e469 in FlowForceReassemblyForFlow (f=f@entry=0x7f1a8fe28300) at flow-timeout.c:352
352     flow-timeout.c: No such file or directory.
(gdb) p f->thread_id
$1 = {0, 3}
(gdb)

Actions #2

Updated by Jeff Lucovsky over 3 years ago

3rd occurrence -- same customer site

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7fef74eff640 (LWP 5103))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff218aec12e in __GI_abort () at abort.c:79
#2  0x00007ff218ae380a in __assert_fail_base (fmt=0x7ff218c3d040 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55cd9d1f3e88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x55cd9d1f3ef6 "tm-threads.c", line=line@entry=2320,
    function=function@entry=0x55cd9d1f3fd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:92
#3  0x00007ff218ae3882 in __GI___assert_fail (assertion=assertion@entry=0x55cd9d1f3e88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x55cd9d1f3ef6 "tm-threads.c", line=line@entry=2320,
    function=function@entry=0x55cd9d1f3fd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:101
#4  0x000055cd9cf19fa0 in TmThreadsInjectFlowById (f=f@entry=0x7fe1692195c0, id=<optimized out>) at tm-threads.c:2320
#5  0x000055cd9ceb9469 in FlowForceReassemblyForFlow (f=f@entry=0x7fe1692195c0) at flow-timeout.c:352
#6  0x000055cd9ceb6865 in ProcessAsideQueue (td=td@entry=0x7fef74201030, counters=<optimized out>, counters=<optimized out>) at flow-manager.c:335
#7  0x000055cd9ceb6d4d in FlowTimeoutHash (td=td@entry=0x7fef74201030, ts=ts@entry=0x7fef74efd310, hash_min=<optimized out>, hash_max=<optimized out>, counters=counters@entry=0x7fef74efd330) at flow-manager.c:524
#8  0x000055cd9ceb769c in FlowTimeoutHashInChunks (chunks=40, iter=11, counters=0x7fef74efd330, hash_max=<optimized out>, hash_min=<optimized out>, ts=0x7fef74efd310, td=0x7fef74201030) at flow-manager.c:554
#9  FlowManager (th_v=0x7feff9266b40, thread_data=0x7fef74201000) at flow-manager.c:899
#10 0x000055cd9cf17f54 in TmThreadsManagement (td=0x7feff9266b40) at tm-threads.c:551
#11 0x00007ff21a75c37e in start_thread (arg=0x7fef74eff640) at pthread_create.c:463
#12 0x00007ff218badb5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) fr 5
#5  0x000055cd9ceb9469 in FlowForceReassemblyForFlow (f=f@entry=0x7fe1692195c0) at flow-timeout.c:352
352     flow-timeout.c: No such file or directory.
(gdb) p f->thread_id
$1 = {0, 19}

Actions #3

Updated by Jeff Lucovsky about 3 years ago

Occurred again with the patches from #4478 applied.

(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f387060512e in __GI_abort () at abort.c:79
#2  0x00007f38705fc80a in __assert_fail_base (fmt=0x7f3870756040 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5615ef73cde8 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x5615ef73ce56 "tm-threads.c", line=line@entry=2320,
    function=function@entry=0x5615ef73cf30 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:92
#3  0x00007f38705fc882 in __GI___assert_fail (assertion=assertion@entry=0x5615ef73cde8 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x5615ef73ce56 "tm-threads.c", line=line@entry=2320,
    function=function@entry=0x5615ef73cf30 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:101
#4  0x00005615ef462f80 in TmThreadsInjectFlowById (f=f@entry=0x7f369e2cf580, id=<optimized out>) at tm-threads.c:2320
#5  0x00005615ef402439 in FlowForceReassemblyForFlow (f=f@entry=0x7f369e2cf580) at flow-timeout.c:352
#6  0x00005615ef3ff835 in ProcessAsideQueue (td=td@entry=0x7f35cba00030, counters=<optimized out>, counters=<optimized out>) at flow-manager.c:335
#7  0x00005615ef3ffd1d in FlowTimeoutHash (td=td@entry=0x7f35cba00030, ts=ts@entry=0x7f35cc6fd310, hash_min=<optimized out>, hash_max=<optimized out>, counters=counters@entry=0x7f35cc6fd330) at flow-manager.c:524
#8  0x00005615ef40066c in FlowTimeoutHashInChunks (chunks=40, iter=10, counters=0x7f35cc6fd330, hash_max=<optimized out>, hash_min=<optimized out>, ts=0x7f35cc6fd310, td=0x7f35cba00030) at flow-manager.c:554
#9  FlowManager (th_v=0x7f3639169f80, thread_data=0x7f35cba00000) at flow-manager.c:899
#10 0x00005615ef460f34 in TmThreadsManagement (td=0x7f3639169f80) at tm-threads.c:551
#11 0x00007f387227537e in start_thread (arg=0x7f35cc6ff640) at pthread_create.c:463
#12 0x00007f38706c6b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) fr 4
#4  0x00005615ef462f80 in TmThreadsInjectFlowById (f=f@entry=0x7f369e2cf580, id=<optimized out>) at tm-threads.c:2320
2320    in tm-threads.c
(gdb) p f->thread_id
$4 = {0, 30}
Actions #4

Updated by Jeff Lucovsky about 3 years ago

This occurs with regularity at one of our customer sites.

Could slot 0 of thread_id remain the same as its initialized state (0) if traffic if only pkts to the client are observed?

The slots are assigned based on the idx value from HandleThreadId: const int idx = (!(PKT_IS_TOSERVER(p)));

Actions #5

Updated by Victor Julien about 3 years ago

Can you show the contents of the flow? At least the stats in it and p/x f->flags?

Actions #6

Updated by Jeff Lucovsky about 3 years ago

(gdb) p/x f->flags
$1 = 0x10031b
(gdb) p *f
$2 = {src = {address = {address_un_data32 = {3017916995, 0, 0, 0}, address_un_data16 = {49731, 46049, 0, 0, 0, 0, 0, 0}, address_un_data8 = "C\302\341\263", '\000' <repeats 11 times>}}, dst = {address = {address_un_data32 = {1348125032, 0, 0, 0}, address_un_data16 = {
        49512, 20570, 0, 0, 0, 0, 0, 0}, address_un_data8 = "h\301ZP", '\000' <repeats 11 times>}}, {sp = 61436, icmp_s = {type = 252 '\374', code = 239 '\357'}}, {dp = 443, icmp_d = {type = 187 '\273', code = 1 '\001'}}, proto = 6 '\006', recursion_level = 0 '\000',
  vlan_id = {0, 0}, use_cnt = 0, vlan_idx = 0 '\000', {{ffr_ts = 1 '\001', ffr_tc = 1 '\001'}, ffr = 17 '\021'}, timeout_at = 1630017823, thread_id = {0, 30}, next = 0x0, livedev = 0x0, flow_hash = 2575188026, lastts = {tv_sec = 1630017523, tv_usec = 544004},
  timeout_policy = 300, flow_state = 1, tenant_id = 0, probing_parser_toserver_alproto_masks = 0, probing_parser_toclient_alproto_masks = 0, flags = 1049371, file_flags = 3087, protodetect_dp = 0, parent_id = 0, m = {__data = {__lock = 1, __count = 0, __owner = 10851,
      __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000c*\000\000\001", '\000' <repeats 26 times>, __align = 1}, protoctx = 0x7f346926b480, protomap = 0 '\000',
  flow_end_flags = 0 '\000', alproto = 0, alproto_ts = 0, alproto_tc = 0, alproto_orig = 0, alproto_expect = 0, de_ctx_version = 99, min_ttl_toserver = 123 '{', max_ttl_toserver = 123 '{', min_ttl_toclient = 49 '1', max_ttl_toclient = 49 '1', alparser = 0x0,
  alstate = 0x0, sgh_toclient = 0x7f3633331550, sgh_toserver = 0x7f363314c130, flowvar = 0x0, fb = 0x0, startts = {tv_sec = 1630017482, tv_usec = 114984}, todstpktcnt = 1, tosrcpktcnt = 101, todstbytecnt = 64, tosrcbytecnt = 8464}
Actions #7

Updated by Victor Julien about 3 years ago

Can you also add p *(TcpSession *)f->protoctx?

Actions #8

Updated by Jeff Lucovsky about 3 years ago

(gdb) p *(TcpSession *)f->protoctx
$4 = {res = 29, state = 4 '\004', pstate = 0 '\000', queue_len = 0 '\000', data_first_seen_dir = 8 '\b', tcp_packet_flags = 25 '\031', flags = 5123, reassembly_depth = 5242880, server = {flags = 0, wscale = 14, os_policy = 0 '\000', tcp_flags = 25 '\031',
    isn = 993559311, next_seq = 993559312, last_ack = 993559312, next_win = 993559312, window = 0, last_ts = 0, last_pkt_ts = 0, base_seq = 993559312, app_progress_rel = 0, raw_progress_rel = 0, log_progress_rel = 0, min_inspect_depth = 0, data_required = 0, sb = {
      cfg = 0x5615efa7a550 <stream_config+48>, stream_offset = 0, buf = 0x0, buf_size = 0, buf_offset = 0, sbb_tree = {rbh_root = 0x0}, head = 0x0}, seg_tree = {rbh_root = 0x0}, segs_right_edge = 0, sack_size = 0, sack_tree = {rbh_root = 0x0}}, client = {flags = 0,
    wscale = 14, os_policy = 12 '\f', tcp_flags = 0 '\000', isn = 3770227041, next_seq = 3770227088, last_ack = 3770227042, next_win = 3779647842, window = 9420800, last_ts = 0, last_pkt_ts = 0, base_seq = 3770227042, app_progress_rel = 0, raw_progress_rel = 0,
    log_progress_rel = 0, min_inspect_depth = 0, data_required = 0, sb = {cfg = 0x5615efa7a550 <stream_config+48>, stream_offset = 0, buf = 0x7f3442c3e800 "\027\003\003", buf_size = 2048, buf_offset = 46, sbb_tree = {rbh_root = 0x0}, head = 0x0}, seg_tree = {
      rbh_root = 0x7f3442ae0620}, segs_right_edge = 3770227088, sack_size = 0, sack_tree = {rbh_root = 0x0}}, queue = 0x0}
Actions #9

Updated by Victor Julien about 3 years ago

The following flow flags are set

FLOW_IPV4
FLOW_TO_SRC_SEEN
FLOW_TO_DST_SEEN
FLOW_TOSERVER_IPONLY_SET
FLOW_TOCLIENT_IPONLY_SET
FLOW_SGH_TOSERVER
FLOW_SGH_TOCLIENT

So nothing special.

Traffic is TCP. We've seen a single packet to the server direction (todstpktcnt = 1). There is a single path that could avoid us to reach HandleThreadId, which is when the packet had a bad checksum. Do you see any bad checksums in the stats of this sensor?

Actions #10

Updated by Jeff Lucovsky about 3 years ago

The data from the time of the crash is not available ... We do see many occurrences of non-zero invalid_checksum , e.g., 38470 invalid checksums out of 75,644,142 packets

Actions #11

Updated by Victor Julien about 3 years ago

Thanks, I think the TcpSession internals support the idea that no toserver packet made it past that checksum check, thus not reaching the HandleThreadId func.

Can you try:

diff --git a/src/stream-tcp.c b/src/stream-tcp.c
index 8d3611285..8ee4aefb1 100644
--- a/src/stream-tcp.c
+++ b/src/stream-tcp.c
@@ -4785,8 +4785,6 @@ int StreamTcpPacket (ThreadVars *tv, Packet *p, StreamTcpThread *stt,

     SCLogDebug("p->pcap_cnt %"PRIu64, p->pcap_cnt);

-    HandleThreadId(tv, p, stt);
-
     TcpSession *ssn = (TcpSession *)p->flow->protoctx;

     /* track TCP flags */
@@ -5207,6 +5205,8 @@ TmEcode StreamTcp (ThreadVars *tv, Packet *p, void *data, PacketQueueNoLock *pq)
         return TM_ECODE_OK;
     }

+    HandleThreadId(tv, p, stt);
+
     /* only TCP packets with a flow from here */

     if (!(p->flags & PKT_PSEUDO_STREAM_END)) {

If you can't try a patch, a work around could be to disable csum checks in the stream engine config.

Actions #12

Updated by Jeff Lucovsky about 3 years ago

The patch was installed on 30 August mid-day and has been running without issue. The frequency of the crash was low and it's not clear what traffic caused the issue to occur. Thus, the patch is not harmful and success cannot be determined at this point.

Actions #13

Updated by Victor Julien about 3 years ago

  • Related to Bug #4468: Assertion failures in TmThreadsInjectFlowById added
Actions #14

Updated by Chris Sibley about 2 years ago

Hi Jeff,

We recently upgraded to Suricata 6.0.6 and have been consistently experiencing crashes that appear to be related to this issue. Specifically, the process crashes with the following error:

suricata: tm-threads.c:2365: TmThreadsInjectFlowById: Assertion `!(id <= 0 || id > (int)thread_store.threads_size)' failed.

systemd reports:

Sep 19 11:43:45 rsmtnmri03 systemd1: suricata.service: Main process exited, code=killed, status=6/ABRT
Sep 19 11:43:45 rsmtnmri03 systemd1: suricata.service: Failed with result 'signal'.
Sep 19 11:43:45 rsmtnmri03 systemd1: suricata.service: Consumed 10h 52min 27.523s CPU time.

After finding this bug, I updated the suricata configuration to disable stream engine checksum checks and suricata has been stable ever since. We have a second server, processing the same traffic, that I have not made the configuration change on and it continues to crash.

Thanks,

--Chris

Actions #15

Updated by Victor Julien about 2 years ago

  • Status changed from New to Assigned
  • Assignee set to Victor Julien
  • Target version set to 6.0.7
Actions #16

Updated by Victor Julien about 2 years ago

  • Status changed from Assigned to In Progress
  • Target version changed from 6.0.7 to 7.0.0-beta1
  • Label Needs backport to 6.0 added
Actions #17

Updated by Victor Julien about 2 years ago

  • Subtask #5559 added
Actions #18

Updated by Victor Julien about 2 years ago

  • Status changed from In Progress to In Review
Actions #19

Updated by Victor Julien about 2 years ago

  • Status changed from In Review to Closed
  • Label deleted (Needs backport to 6.0)
Actions

Also available in: Atom PDF