Bug #7950: mime: incorrect decoding of quoted-printable text attachments - Suricata - Open Information Security Foundation

Actions

Copy link

Bug #7950

closed

mime: incorrect decoding of quoted-printable text attachments

Added by Marko Jahnke 3 months ago. Updated about 1 month ago.

Status:

Closed

Priority:

Normal

Assignee:

Philippe Antoine

Target version:

9.0.0-beta1

Affected Versions:

8.0.2

Effort:

Difficulty:

Label:

Description

If I am correct, there might be a decoding problem with decoding quoted-printable encoded email text attachments in Suricata.

In addition, if there is an empty line at the end of the last attachment (before the ".") it is also appended to the decoded file. AFAIK, that should not be the case.

I believe I found a test case where suricata 7.0.12 and 8.0.1 and the GMime library (as a reference) all produce different text file output and checksums which might induce an IoC matching problem.

"testcase.smtp" is the SMTP stream from the PCAP extracted from wireshark follow TCP stream.
"Attachment2-gmime" is the output that is generated by the GMime library.
"Attachment2-suri7" and "Attachment-suri8" are the respective outputs of the above versions when activating filestore.

$ diff Attachment2-gmime Attachment2-suri7 
119c119
< ===================================================================
---
> ==3D================================================================
246,247c246
< +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, 
< const gchar**  ret_url)
---
> +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, const gchar**  ret_url)
451c450
< ===================================================================
---
> ==========================================3D========================
1037a1037
>

$ diff Attachment2-gmime Attachment2-suri8 
119c119
< ===================================================================
---
> ==3D================================================================
246,247c246
< +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, 
< const gchar**  ret_url)
---
> +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, const gchar**  ret_url)
451c450
< ===================================================================
---
> ==========================================D========================
1037a1037
>

Hopefully I did not make a mistake. But if I am correct, there might be unwanted IoC differences.

Best regards,
MaJa

Files

Download all files

testcase.smtp (42.6 KB) testcase.smtp		Marko Jahnke, 09/24/2025 01:47 PM
testcase.pcap (131 KB) testcase.pcap		Marko Jahnke, 09/24/2025 01:47 PM
Attachment2-gmime (33.4 KB) Attachment2-gmime		Marko Jahnke, 09/24/2025 01:56 PM
Attachment2-suri7 (33.4 KB) Attachment2-suri7		Marko Jahnke, 09/24/2025 01:56 PM
Attachment2-suri8 (33.4 KB) Attachment2-suri8		Marko Jahnke, 09/24/2025 01:56 PM
Bildschirmfoto_2025-10-03_15-56-16.png (135 KB) Bildschirmfoto_2025-10-03_15-56-16.png	Screenshot of testcase.smtp in Thunderbird/Trixie	Albrecht Dreß, 10/03/2025 02:04 PM
Bildschirmfoto_2025-10-06_10-27-12.png (53 KB) Bildschirmfoto_2025-10-06_10-27-12.png	Decodes Wireshark output	Albrecht Dreß, 10/06/2025 08:33 AM
Bildschirmfoto_2025-10-06_10-24-33.png (87.2 KB) Bildschirmfoto_2025-10-06_10-24-33.png	Wireshark HEX dump	Albrecht Dreß, 10/06/2025 08:33 AM

Subtasks 2 (0 open — 2 closed)

Actions

Copy link

Updated by Marko Jahnke 3 months ago

Ma Ja wrote:

If I am correct, there might be a decoding problem with decoding quoted-printable encoded email text attachments in Suricata.

Of course, with "attachments" I meant MIME multipart bodyparts.

Actions

Copy link

Updated by Victor Julien 3 months ago

Status changed from New to Assigned
Assignee changed from OISF Dev to Philippe Antoine
Target version changed from TBD to 9.0.0-beta1

@Philippe Antoine can you check this and mark for backport(s) if needed?

Actions

Copy link

Updated by Philippe Antoine 3 months ago

Suricata 8 and 7 seem incorrect.

So does Gmime in another way, while comparing to Wireshark IMF exported object

Actions

Copy link

Updated by Philippe Antoine 3 months ago

Label Needs backport to 7.0, Needs backport to 8.0 added

Not really a backport for 7, but a fix for the C parser...

Actions

Copy link

Updated by OISF Ticketbot 3 months ago

Subtask #7961 added

Actions

Copy link

Updated by OISF Ticketbot 3 months ago

Label deleted (~~Needs backport to 8.0~~)

Actions

Copy link

Updated by OISF Ticketbot 3 months ago

Subtask #7962 added

Actions

Copy link

Updated by OISF Ticketbot 3 months ago

Label deleted (~~Needs backport to 7.0~~)

Actions

Copy link

Updated by Philippe Antoine 3 months ago

Status changed from Assigned to In Review

https://github.com/OISF/suricata/pull/13922

Actions

Copy link

#10

Updated by Marko Jahnke 3 months ago

@Philippe Antoine wrote:

So does Gmime in another way, while comparing to Wireshark IMF exported object

Is it possible to tell what GMime does wrong? If we use that as a reference, we might also have a problem.

Actions

Copy link

#11

Updated by Philippe Antoine 3 months ago

< +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, 
< const gchar**  ret_url)
---
> +static gboolean related_url_string_cb(field_info *finfo, gboolean doit, const gchar**  ret_url)

I think Gmime does insert wrongly a newline (compared to Wireshark and suricata)

Actions

Copy link

#12

Updated by Albrecht Dreß 3 months ago

(Sorry for jumping into this thread, a colleague pointed me to it as I use GMime in several projects, inter alia the MUA Balsa – thus I'm interested in any possible bugs in that library…)

The dump from the PCAP actually looks a little odd at this point:

000030d0  73 74 61 74 69 63 20 67  62 6f 6f 6c 65 61 6e 20  |static gboolean |
000030e0  72 65 6c 61 74 65 64 5f  75 72 6c 5f 73 74 72 69  |related_url_stri|
000030f0  6e 67 5f 63 62 28 66 69  65 6c 64 5f 69 6e 66 6f  |ng_cb(field_info|
00003100  20 2a 66 69 6e 66 6f 2c  20 67 62 6f 6f 6c 65 61  | *finfo, gboolea|
00003110  6e 20 64 6f 69 74 2c 20  3d 0a 0a 63 6f 6e 73 74  |n doit, =..const|
00003120  20 67 63 68 61 72 2a 2a  20 20 72 65 74 5f 75 72  | gchar**  ret_ur|
00003130  6c 29 3d 30 41 3d 0a 2b  7b 3d 30 41 3d 0a 2b 3d  |l)=0A=.+{=0A=.+=|

Apparently, the sending MUA did not convert the line breaks to into RFC 5322 (i.e. CRLF) sequences. However, RFC 2045, Section 6.7, Clause 4 (Line Breaks) states

A line break in a text body, represented as a CRLF sequence in the text canonical form, must be represented by a (RFC 822) line break, which is also a CRLF sequence, in the Quoted-Printable encoding.
[…]
Note that many implementations may elect to encode the local representation of various content types directly rather than converting to canonical form first, encoding, and then converting back to local representation. In particular, this may apply to plain text material on systems that use newline conventions other than a CRLF terminator sequence. Such an implementation optimization is permissible, but only when the combined canonicalization-encoding step is equivalent to performing the three steps separately.

IMHO the GMime decoder does decode the input at offset 0x3118 correctly according to this optimisation: the two octets 0x3d 0x0a represent the soft line break which is removed according to Clause 5 of the the aforementioned standard, whilst the hard line break at offset 0x311a is preserved. It may be confusing that the attachment is an application/octet-stream, but the standard does not explicitly rule out using the “simplified” line breaks for this content type. This looks really like a somewhat special corner case…

Or did I miss something here?

Actions

Copy link

#13

Updated by Albrecht Dreß 3 months ago

File Bildschirmfoto_2025-10-03_15-56-16.png Bildschirmfoto_2025-10-03_15-56-16.png added

Albrecht Dreß wrote in #note-12:

IMHO the GMime decoder does decode the input at offset 0x3118 correctly according to this optimisation: the two octets 0x3d 0x0a represent the soft line break which is removed according to Clause 5 of the the aforementioned standard, whilst the hard line break at offset 0x311a is preserved. […]

As additional test, I loaded testcase.smtp into Thunderbird on Trixie, opened the 2nd attachment (TB calls an external application for that), and apparently TB does also keep the newline (which breaks the patch file, but that seems to be an issue of the MUA producing the message):

Screenshot of testcase.smtp in Thunderbird/Trixie

Actions

Copy link

#14

Updated by Philippe Antoine 3 months ago

The patch file looks indeed correct without the newline...

Actions

Copy link

#15

Updated by Philippe Antoine 3 months ago

Status changed from In Review to Resolved

https://github.com/OISF/suricata/pull/13937

@albrecht thanks for the feedback.

your pcap dump seems wrong, I see

00003194  2c 20 3d 0d                                        , =.
00003198  0a 63 6f 6e 73 74 20 67  63 68 61 72 2a 2a 20 20   .const g char**

So, 3d0d0a and not 3d0a0a as you posted in #12

3d0d0a seems a legit soft line break

Actions

Copy link Download all files

#16

Updated by Albrecht Dreß 3 months ago

File Bildschirmfoto_2025-10-06_10-24-33.png Bildschirmfoto_2025-10-06_10-24-33.png added
File Bildschirmfoto_2025-10-06_10-27-12.png Bildschirmfoto_2025-10-06_10-27-12.png added

Philippe Antoine wrote in #note-15:

@albrecht thanks for the feedback.

your pcap dump seems wrong, I see
[...]

So, 3d0d0a and not 3d0a0a as you posted in #12

3d0d0a seems a legit soft line break

Yes, your right, of course!

Looking into the Wireshark hex display of the re-assembled TCP stream, I actually see
Wireshark HEX dump
as you said.

Switch Wireshark to ASCII display, save the re-assembled TCP stream to a file, and run hd on it, I get
Decodes Wireshark output
I.e. I probably didn't understand how Wireshark's ASCII export actually works (or there is a glitch in Wireshark's export?) which led to the confusion… Fixing that byte, GMime, too, produces the proper output.

Thanks again for the clarification!

Actions

Copy link

#17

Updated by Philippe Antoine 2 months ago

Status changed from Resolved to Closed

Actions

Copy link

#18

Updated by Shivani Bhardwaj about 2 months ago

Subject changed from Potentially incorrect decoding of quoted-printable mime text attachments to mime: incorrect decoding of quoted-printable text attachments

Actions

Copy link

#19

Updated by Marko Jahnke about 1 month ago

Hi,

when checking version 8.0.2 with the above test pcap, I still see the difference in the decoded file in the filestore.

Are your sure that this has been fixed?

Best regards,
MaJa

Actions

Copy link

#20

Updated by Philippe Antoine about 1 month ago

I think it is fixed and test by SV test mime-quoted-printable which uses this pcap.
What do you see wrong with this test ?

Actions

Copy link

#21

Updated by Marko Jahnke about 1 month ago

Affected Versions 8.0.2 added
Affected Versions deleted (~~7.0.12, 8.0.1~~)

I compiled 7.0.13 from source, and the extracted attachment was correctly decoded (i.e., there was no difference to the Gmime output). Thus, the bug was successfully fixed. Thank you for that.

But I did it for 8.0.2 as well. In this case, the attachment did not change in comparison to the "Attachment2-suri8" above.

I repeated it twice with downloading, compiling, and running, but it was the same.
Maybe I made a mistake, but I do not see where.

Actions

Copy link

#22

Updated by Marko Jahnke about 1 month ago

OK, never mind.

It seems, I compared a leftover file from the test with 8.0.1.

I am sorry for the confusion.

Thanks again.
Marko

Actions

Copy link

#23

Updated by Philippe Antoine about 1 month ago

Thanks for double checking :-)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Suricata

Custom queries

Bug #7950

mime: incorrect decoding of quoted-printable text attachments

Updated by Marko Jahnke 3 months ago

Updated by Victor Julien 3 months ago

Updated by Philippe Antoine 3 months ago

Updated by Philippe Antoine 3 months ago

Updated by OISF Ticketbot 3 months ago

Updated by OISF Ticketbot 3 months ago

Updated by OISF Ticketbot 3 months ago

Updated by OISF Ticketbot 3 months ago

Updated by Philippe Antoine 3 months ago

Updated by Marko Jahnke 3 months ago

Updated by Philippe Antoine 3 months ago

Updated by Albrecht Dreß 3 months ago

Updated by Albrecht Dreß 3 months ago

Updated by Philippe Antoine 3 months ago

Updated by Philippe Antoine 3 months ago

Updated by Albrecht Dreß 3 months ago

Updated by Philippe Antoine 2 months ago

Updated by Shivani Bhardwaj about 2 months ago

Updated by Marko Jahnke about 1 month ago

Updated by Philippe Antoine about 1 month ago

Updated by Marko Jahnke about 1 month ago

Updated by Marko Jahnke about 1 month ago

Updated by Philippe Antoine about 1 month ago