Project

General

Profile

Actions

Bug #7950

closed

Potentially incorrect decoding of quoted-printable mime text attachments

Added by Marko Jahnke 26 days ago. Updated 7 days ago.

Status:
Closed
Priority:
Normal
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Description

If I am correct, there might be a decoding problem with decoding quoted-printable encoded email text attachments in Suricata.

In addition, if there is an empty line at the end of the last attachment (before the ".") it is also appended to the decoded file. AFAIK, that should not be the case.

I believe I found a test case where suricata 7.0.12 and 8.0.1 and the GMime library (as a reference) all produce different text file output and checksums which might induce an IoC matching problem.

  • "testcase.smtp" is the SMTP stream from the PCAP extracted from wireshark follow TCP stream.
  • "Attachment2-gmime" is the output that is generated by the GMime library.
  • "Attachment2-suri7" and "Attachment-suri8" are the respective outputs of the above versions when activating filestore.
$ diff Attachment2-gmime Attachment2-suri7 
119c119
< ===================================================================
---
> ==3D================================================================
246,247c246
< +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, 
< const gchar**  ret_url)
---
> +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, const gchar**  ret_url)
451c450
< ===================================================================
---
> ==========================================3D========================
1037a1037
>
$ diff Attachment2-gmime Attachment2-suri8 
119c119
< ===================================================================
---
> ==3D================================================================
246,247c246
< +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, 
< const gchar**  ret_url)
---
> +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, const gchar**  ret_url)
451c450
< ===================================================================
---
> ==========================================D========================
1037a1037
> 

Hopefully I did not make a mistake. But if I am correct, there might be unwanted IoC differences.

Best regards,
MaJa


Files

testcase.smtp (42.6 KB) testcase.smtp Marko Jahnke, 09/24/2025 01:47 PM
testcase.pcap (131 KB) testcase.pcap Marko Jahnke, 09/24/2025 01:47 PM
Attachment2-gmime (33.4 KB) Attachment2-gmime Marko Jahnke, 09/24/2025 01:56 PM
Attachment2-suri7 (33.4 KB) Attachment2-suri7 Marko Jahnke, 09/24/2025 01:56 PM
Attachment2-suri8 (33.4 KB) Attachment2-suri8 Marko Jahnke, 09/24/2025 01:56 PM
Bildschirmfoto_2025-10-03_15-56-16.png (135 KB) Bildschirmfoto_2025-10-03_15-56-16.png Screenshot of testcase.smtp in Thunderbird/Trixie Albrecht Dreß, 10/03/2025 02:04 PM
Bildschirmfoto_2025-10-06_10-27-12.png (53 KB) Bildschirmfoto_2025-10-06_10-27-12.png Decodes Wireshark output Albrecht Dreß, 10/06/2025 08:33 AM
Bildschirmfoto_2025-10-06_10-24-33.png (87.2 KB) Bildschirmfoto_2025-10-06_10-24-33.png Wireshark HEX dump Albrecht Dreß, 10/06/2025 08:33 AM

Subtasks 2 (0 open2 closed)

Bug #7961: Potentially incorrect decoding of quoted-printable mime text attachments (8.0.x backport)ClosedPhilippe AntoineActions
Bug #7962: Potentially incorrect decoding of quoted-printable mime text attachments (7.0.x backport)ClosedPhilippe AntoineActions
Actions #1

Updated by Marko Jahnke 26 days ago

Ma Ja wrote:

If I am correct, there might be a decoding problem with decoding quoted-printable encoded email text attachments in Suricata.

Of course, with "attachments" I meant MIME multipart bodyparts.

Actions #2

Updated by Victor Julien 21 days ago

  • Status changed from New to Assigned
  • Assignee changed from OISF Dev to Philippe Antoine
  • Target version changed from TBD to 9.0.0-beta1

@Philippe Antoine can you check this and mark for backport(s) if needed?

Actions #3

Updated by Philippe Antoine 21 days ago

Suricata 8 and 7 seem incorrect.

So does Gmime in another way, while comparing to Wireshark IMF exported object

Actions #4

Updated by Philippe Antoine 21 days ago

  • Label Needs backport to 7.0, Needs backport to 8.0 added

Not really a backport for 7, but a fix for the C parser...

Actions #5

Updated by OISF Ticketbot 21 days ago

  • Subtask #7961 added
Actions #6

Updated by OISF Ticketbot 21 days ago

  • Label deleted (Needs backport to 8.0)
Actions #7

Updated by OISF Ticketbot 21 days ago

  • Subtask #7962 added
Actions #8

Updated by OISF Ticketbot 21 days ago

  • Label deleted (Needs backport to 7.0)
Actions #9

Updated by Philippe Antoine 21 days ago

  • Status changed from Assigned to In Review
Actions #10

Updated by Marko Jahnke 19 days ago

@Philippe Antoine wrote:

So does Gmime in another way, while comparing to Wireshark IMF exported object

Is it possible to tell what GMime does wrong? If we use that as a reference, we might also have a problem.

Actions #11

Updated by Philippe Antoine 19 days ago

< +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, 
< const gchar**  ret_url)
---
> +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, const gchar**  ret_url)

I think Gmime does insert wrongly a newline (compared to Wireshark and suricata)

Actions #12

Updated by Albrecht Dreß 18 days ago

(Sorry for jumping into this thread, a colleague pointed me to it as I use GMime in several projects, inter alia the MUA Balsa – thus I'm interested in any possible bugs in that library…)

The dump from the PCAP actually looks a little odd at this point:

000030d0  73 74 61 74 69 63 20 67  62 6f 6f 6c 65 61 6e 20  |static gboolean |
000030e0  72 65 6c 61 74 65 64 5f  75 72 6c 5f 73 74 72 69  |related_url_stri|
000030f0  6e 67 5f 63 62 28 66 69  65 6c 64 5f 69 6e 66 6f  |ng_cb(field_info|
00003100  20 2a 66 69 6e 66 6f 2c  20 67 62 6f 6f 6c 65 61  | *finfo, gboolea|
00003110  6e 20 64 6f 69 74 2c 20  3d 0a 0a 63 6f 6e 73 74  |n doit, =..const|
00003120  20 67 63 68 61 72 2a 2a  20 20 72 65 74 5f 75 72  | gchar**  ret_ur|
00003130  6c 29 3d 30 41 3d 0a 2b  7b 3d 30 41 3d 0a 2b 3d  |l)=0A=.+{=0A=.+=|

Apparently, the sending MUA did not convert the line breaks to into RFC 5322 (i.e. CRLF) sequences. However, RFC 2045, Section 6.7, Clause 4 (Line Breaks) states

A line break in a text body, represented as a CRLF sequence in the text canonical form, must be represented by a (RFC 822) line break, which is also a CRLF sequence, in the Quoted-Printable encoding.
[…]
Note that many implementations may elect to encode the local representation of various content types directly rather than converting to canonical form first, encoding, and then converting back to local representation. In particular, this may apply to plain text material on systems that use newline conventions other than a CRLF terminator sequence. Such an implementation optimization is permissible, but only when the combined canonicalization-encoding step is equivalent to performing the three steps separately.

IMHO the GMime decoder does decode the input at offset 0x3118 correctly according to this optimisation: the two octets 0x3d 0x0a represent the soft line break which is removed according to Clause 5 of the the aforementioned standard, whilst the hard line break at offset 0x311a is preserved. It may be confusing that the attachment is an application/octet-stream, but the standard does not explicitly rule out using the “simplified” line breaks for this content type. This looks really like a somewhat special corner case…

Or did I miss something here?

Actions #13

Updated by Albrecht Dreß 17 days ago

Albrecht Dreß wrote in #note-12:

IMHO the GMime decoder does decode the input at offset 0x3118 correctly according to this optimisation: the two octets 0x3d 0x0a represent the soft line break which is removed according to Clause 5 of the the aforementioned standard, whilst the hard line break at offset 0x311a is preserved. […]

As additional test, I loaded testcase.smtp into Thunderbird on Trixie, opened the 2nd attachment (TB calls an external application for that), and apparently TB does also keep the newline (which breaks the patch file, but that seems to be an issue of the MUA producing the message):

Screenshot of testcase.smtp in Thunderbird/Trixie

Actions #14

Updated by Philippe Antoine 17 days ago

The patch file looks indeed correct without the newline...

Actions #15

Updated by Philippe Antoine 14 days ago

  • Status changed from In Review to Resolved

https://github.com/OISF/suricata/pull/13937

@albrecht thanks for the feedback.

your pcap dump seems wrong, I see

00003194  2c 20 3d 0d                                        , =.
00003198  0a 63 6f 6e 73 74 20 67  63 68 61 72 2a 2a 20 20   .const g char**  

So, 3d0d0a and not 3d0a0a as you posted in #12

3d0d0a seems a legit soft line break

Updated by Albrecht Dreß 14 days ago

Philippe Antoine wrote in #note-15:

@albrecht thanks for the feedback.

your pcap dump seems wrong, I see
[...]

So, 3d0d0a and not 3d0a0a as you posted in #12

3d0d0a seems a legit soft line break

Yes, your right, of course!

Looking into the Wireshark hex display of the re-assembled TCP stream, I actually see
Wireshark HEX dump
as you said.

Switch Wireshark to ASCII display, save the re-assembled TCP stream to a file, and run hd on it, I get
Decodes Wireshark output
I.e. I probably didn't understand how Wireshark's ASCII export actually works (or there is a glitch in Wireshark's export?) which led to the confusion… Fixing that byte, GMime, too, produces the proper output.

Thanks again for the clarification!

Actions #17

Updated by Philippe Antoine 7 days ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF