Project

General

Profile

Actions

Bug #6066

closed

Memory Corruption in util-streaming-buffer

Added by Todd Mortimer 12 months ago. Updated 11 months ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Description

While running a recent version of suricata (commit 2ddd26446e3, 12 Apr 2023) we have been getting crashes related to memory corruption in the streaming buffer.

Our suricata.yaml has:

stream:
  inline: yes
  midstream: true
  async-oneside: true

We noticed the instability manifested when we switched to inline: yes.

The crashes manifest as seemingly random memory corruption. Using ASAN, we found a large write (>4GB) in GrowRegionToSize() at util-streaming-buffer.c:609.

The backtrace was:

GrowRegionToSize                            util-streaming-buffer:609    https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L609
BufferInsertAtRegionConsolidate             util-streaming-buffer.c:1109 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1109
BufferInsertAtRegionDo                      util-streaming-buffer.c:1245 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1245
BufferInsertAtRegion                        util-streaming-buffer.c:1311 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1311
StreamingBufferInsertAt                     util-streaming-buffer.c:1331 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1331
InsertSegmentDataCustom                     stream-tcp-list.c:99         https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp-list.c#L99
StreamTcpReassembleInsertSegment            stream-tcp-list.c:650        https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp-list.c#L650
StreamTcpReassembleHandleSegmentHandleData  stream-tcp-reassemble.c:804  https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp-reassemble.c#L804
StreamTcpReassembleHandleSegment            stream-tcp-reassemble.c:2031 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp-reassemble.c#L2031
HandleEstablisherPacketToClient             stream-tcp.c:2813            https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L2813
StreamTcpPacketStateEstablished             stream-tcp.c:3225            https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L3225
StreamTcpStateDispatch                      stream-tcp.c:5238            https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L5238            
StreamTcpPacket                             stream-tcp.c:5447            https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L5447
StreamTcp                                   stream-tcp.c:5759            https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L5759
FlowWorkerStreamTCPUpdate                   flow-worker.c:377            https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/flow-worker.c#L377
FlowWorker                                  flow-worker.c:555            https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/flow-worker.c#L555
TmThreadsSlotVarRun                         tm-threads.c:119             https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/tm-threads.c#L119
TmThreadsSlotProcessPkt                     tm-threads.h:194             https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/tm-threads.h#L194
ReceiveDPDKLoop                             source-dpdk.c:413            https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/source-dpdk.c#L413
TmThreadsSlotPktAcqLoop                     tm-threads.c:314             https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/tm-threads.c#L314

We have identified that the problem is here:

https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#LL605C1-L609C30

    /* for safe printing and general caution, lets memset the
     * new data to 0 */
    size_t diff = grow - region->buf_size;
    void *new_mem = ((char *)ptr) + region->buf_size;
    memset(new_mem, 0, diff);

Specifically in our case `grow` is less than `region->buf_size`, and so the calculation of `diff` underflows and the `memset()` writes NULL bytes to a large chunk of memory.

We have a workaround that simplify identifies when this condition will occur and returns `-1` to indicate error before reallocating the buffer. When this happens, the requested new `size` of the buffer passed to `GrowRegionToSize()` is actually less than the existing `region->buf_size`, so the call is trying to shrink the buffer. Simply returning an error here seems effective at preventing the memory corruption (and subsequent segfault), but is obviously not the good solution since it is not handling the reassembly correctly.

After applying this workaround, suricata stability is much improved, but now we seem to be hitting an assert (which may or may not be related - it is possible these are two separate issues):

https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1482

The backtrace on this assert is (sorry, no line numbers):

StreamingBufferSBBGetData()
StreamReassembleRaw()
PrefilterPktStream()
Prefilter()
DetectRun.part.0()
Detect()
FlowWorker()
TmThreadsSlotVarRun()
ReceiveDPDKLoop()
TmThreadsSlotPktAcqLoop()

It isn't obvious what is going wrong that we are hitting this assert, but looking at the region / SBB segment selection logic, we are trying the following patch:

--- a/src/util-streaming-buffer.c
+++ b/src/util-streaming-buffer.c
@@ -72,10 +72,10 @@ static inline int InclusiveCompare(StreamingBufferBlock *lookup, StreamingBuffer
     const uint64_t tre = intree->offset + intree->len;
     if (lre <= intree->offset)   // entirely before
         return -1;
-    else if (lookup->offset < tre && lre <= tre) // (some) overlap
-        return 0;
-    else
-        return 1;   // entirely after
+    if (tre <= lookup->offset) // entirely after
+       return 1;
+
+    return 0; // (some) overlap
 }

Since the existing logic does not identify the right hand overlap case `(lookup->offset < tre && lre > tre)`, as overlap (it identifies it as entirely after), but we haven't been able to determine if this is effective or not.


Related issues 3 (0 open3 closed)

Related to Suricata - Bug #5834: tcp/regions: list corruptionClosedVictor JulienActions
Related to Suricata - Bug #6041: ASSERT: !(sb->region.buf_offset != 0)ClosedVictor JulienActions
Related to Suricata - Bug #6117: tcp regions streaming buffer: assert failed (!((region->stream_offset == sbb->offset && region->buf_offset > sbb->len))), function StreamingBufferSBBGetDataClosedVictor JulienActions
Actions #1

Updated by Victor Julien 12 months ago

  • Status changed from New to Assigned
  • Assignee changed from OISF Dev to Victor Julien
  • Target version changed from TBD to 7.0.0-rc2
Actions #2

Updated by Victor Julien 12 months ago

@Todd Mortimer do you have a pcap to reproduce the issue?

Actions #3

Updated by Todd Mortimer 12 months ago

Victor Julien wrote in #note-2:

@Todd Mortimer do you have a pcap to reproduce the issue?

I don't have a pcap to reproduce, sorry. I have core dumps but can't share them.

At the time of failure, we see values for region->buf_size = 743424 and the requested size is 335872, etc. So the requested size is smaller than the existing region buffer. Sometimes by a few KB, sometimes about half (as above).

Actions #4

Updated by Philippe Antoine 12 months ago

Quite high numbers, after the original 2048 bytes allocation...

Could you print out more information ? About the flow ? (specific ports, flags like gap)
I wonder how you could get a pcap for these flows...

Actions #5

Updated by Victor Julien 11 months ago

  • Related to Bug #5834: tcp/regions: list corruption added
Actions #6

Updated by Victor Julien 11 months ago

  • Related to Bug #6041: ASSERT: !(sb->region.buf_offset != 0) added
Actions #7

Updated by Victor Julien 11 months ago

  • Priority changed from Normal to High
Actions #8

Updated by Todd Mortimer 11 months ago

Philippe Antoine wrote in #note-4:

Could you print out more information ? About the flow ? (specific ports, flags like gap)

No problem. I captured some info about the buffer state, session, stream and packet at the time this happens.

For an instance where it tried to shrink the region from 745472 to 460800 bytes, (cfg.buf_size = 2048)

Region Info
region.buf_size: 745472
region.buf_offset: 35612
region.stream_offset: 897668

Streaming Buffer (SBBPrintList() output)
sbb offset 0, len 49120
gap offset 49120 len 848548
sbb offset 897668 len 1228
gap offset 898896 len 284896
sbb offset 1183792 len 4912
gap offset 1188704 len 1228
sbb offset 1189932 len 9824
gap offset 1199756 len 46664
sbb offset 1246420 len 6140
gap offset 1252560 len 335244
sbb offset 1587804 len 35612
gap offset 1623416 len 8596
sbb offset 1632012 len 11052

Packet Info:
packet.source port: 443
packet.dest port: 52140
packet.proto: 6
packet.flowflags: 5 = 0b0000_0101 = FLOW_PKT_TOSERVER & FLOW_PKT_ESTABLISHED
packet.payload_len: 1228
packet.flow->flow_state: 1 = FLOW_STATE_ESTABLISHED
packet.flow->flags: 1089802 = 0b0001_0000_1010_0001_0000_1010 = FLOW_TO_DST_SEEN | FLOW_TOSERVER_IPONLY_SET | FLOW_SGH_TOSERVER | FLOW_TS_PM_ALPROTO_DETECT_DONE | FLOW_TS_PE_ALPROTO_DETECT_DONE | FLOW_IPV4
packet.flow->alproto: 0 = ALPROTO_UNKNOWN
packet.flow->todstpktcnt: 99
packet.flow->tosrcpktcnt: 0
packet.flow->todstbytecnt: 129294
packet.flow->tosrcbytecnt: 0

TCP Header info
packet.tcphr.th_sport: 443
packet.tcphr.th_dport: 52140
packet.tcphr.th_seq: 1274407213
packet.tcphr.th_ack: 3686284819
packet.tcphr.th_offx2: 128
packet.tcphr.th_flags: 16
packet.tcphr.th_win: 546
packet.tcphr.th_sum: 11219
packet.tcphr.th_urp: 0

Stream Info
TcpStream.flags: 0
TcpStream.isn: 1273154652
TcpStream.next_seq: 1274797717
TcpStream.last_ack: 1274797717
TcpStream.base_seq: 1273154653

Session Info
TcpSession.state: 4 = TCP_ESTABLISHED
TcpSession.pstate: 0 = TCP_NONE
TcpSession.queue_len: 0
TcpSession.data_first_seen_dir: 4 = STREAM_TOSERVER
TcpSession.tcp_packet_flags: 24
TcpSession.flags: 3147 = STREAMTCP_FLAG_MIDSTREAM | STREAMTCP_FLAG_MIDSTREAM_ESTABLISHED | STREAMTCP_FLAG_TIMESTAMP | STREAMTCP_FLAG_ASYNC | STREAMTCP_FLAG_SACKOK | STREAMTCP_FLAG_LOSSY_BY_LIBERAL
TcpSession.reassembly_depth: 134217728

I can reproduce this pretty reliably (takes some time, but happens at least daily) so I can pull more info if needed about the stream or the streaming buffer state if needed.

Looking at this output, it seems like the stream is one-sided, attached midstream, and is missing a lot of packets so has lots of gaps. The packet we are trying to insert is closer to the beginning of the stream than the end, so it's selected the 3rd region in the streaming buffer to insert into.

Actions #9

Updated by Victor Julien 11 months ago

Does this still happen with the current git master? I've addressed 2 possible related issues in #5834 and #6041

Actions #10

Updated by Todd Mortimer 11 months ago

Victor Julien wrote in #note-9:

Does this still happen with the current git master? I've addressed 2 possible related issues in #5834 and #6041

Checking now - will update in a few days after letting it run for awhile and see if it happens.

Thank you!

Actions #11

Updated by Philippe Antoine 11 months ago

  • Related to Bug #6117: tcp regions streaming buffer: assert failed (!((region->stream_offset == sbb->offset && region->buf_offset > sbb->len))), function StreamingBufferSBBGetData added
Actions #12

Updated by Victor Julien 11 months ago

  • Status changed from Assigned to In Review
Actions #13

Updated by Victor Julien 11 months ago

  • Status changed from In Review to Closed

2 more sets of fixes, hoping that this case is addressed by it as well.

Actions #14

Updated by Todd Mortimer 11 months ago

Victor Julien wrote in #note-9:

Does this still happen with the current git master? I've addressed 2 possible related issues in #5834 and #6041

Okay, we have had this in prod for a couple of days and haven't had any issues with GrowRegionToSize() underflowing, so that looks like it is definitely fixed!

We do see an assert hitting in SBBPrune() now, so I will pull in the latest changes and put them through the test / deploy process to see!

Thank you!

Actions

Also available in: Atom PDF