Feature #5872


file structure awareness - precise identification of fields in file structs

Added by James Emery-Callcott about 1 year ago. Updated about 1 year ago.

Target version:


Earlier today, I was working through a couple of clamav vulnerabilities (CVE-2023-20032, CVE-2023-20052) and as I began traversing through the structure of a .dmg file, I started thinking about how often this process involves considering file magic, calculating offsets, and verifying whether certain fields are static and how time consuming this ends up being. That lead me to the idea of potentially having Suricata be aware of various file structures.

The ask
When we (ET) are writing signatures on a specific filetype, we include the file magic where possible as a signature base before we deep dive into how we build out the logic to detect what we wish to detect. To help explain this ask, I'll walk through the thought process behind CVE-2023-20052 detection both for signatures we can write today and signatures we could write if this feature was implemented.

Essentially, detection for this vulnerability resolves around looking for a potential XXE indicative pattern within the property list of a dmg file. I've included the .dmg struct below for reference.

typedef struct {
        char     Signature[4];          // Magic ('koly')
        uint32_t Version;               // Current version is 4
        uint32_t HeaderSize;            // sizeof(this), always 512
        uint32_t Flags;                 // Flags
        uint64_t RunningDataForkOffset; //
        uint64_t DataForkOffset;        // Data fork offset (usually 0, beginning of file)
        uint64_t DataForkLength;        // Size of data fork (usually up to the XMLOffset, below)
        uint64_t RsrcForkOffset;        // Resource fork offset, if any
        uint64_t RsrcForkLength;        // Resource fork length, if any
        uint32_t SegmentNumber;         // Usually 1, may be 0
        uint32_t SegmentCount;          // Usually 1, may be 0
        uuid_t   SegmentID;             // 128-bit GUID identifier of segment (if SegmentNumber !=0)

    uint32_t DataChecksumType;      // Data fork 
        uint32_t DataChecksumSize;      //  Checksum Information
        uint32_t DataChecksum[32];      // Up to 128-bytes (32 x 4) of checksum

        uint64_t XMLOffset;             // Offset of property list in DMG, from beginning
        uint64_t XMLLength;             // Length of property list
        uint8_t  Reserved1[120];        // 120 reserved bytes - zeroed

    uint32_t ChecksumType;          // Master
        uint32_t ChecksumSize;          //  Checksum information
        uint32_t Checksum[32];          // Up to 128-bytes (32 x 4) of checksum

        uint32_t ImageVariant;          // Commonly 1
        uint64_t SectorCount;           // Size of DMG when expanded, in sectors

        uint32_t reserved2;             // 0
        uint32_t reserved3;             // 0 
        uint32_t reserved4;             // 0

} __attribute__((__packed__)) UDIFResourceFile;


For us to meet the logic described above, we would first write a content match for the Magic at offset 0. Next, we need to check the SegmentNumber because (guessing) that will determine whether UUID is present or not. Next, we need to calculate where the XMLOffset field will be, extract that value with byte_extract, repeat the same again for XMLLength, and then do a content match utilising offset and depth with the byte_extract values to query what XML content is within the property list.

That looks something like this:

content:"|6b 6f 6c 79|"; startswith; byte_test:4,!=,0,44,relative; byte_extract:8,64,xml_offset,relative; byte_extract:8,72,xml_length,relative; content:"!ENTITY|20|"; fast_pattern; offset:xml_offset; depth:xml_length;

If this feature was present, we could do something as follows:

Map magic bytes from file struct to bytes at the current detection pointer based on offset provided


file.struct is now active assuming file.magic returned true, granting access to the fields within that file struct how that field is used is in the air for me right now but here are 3 possibilities, the 3rd being an addition to byte_test that allows the <num of bytes> value to include the length of a file.field when provided.

file.struct:SegmentNumber; byte_test:0,!=,0x0,0;

file.struct; file.field:set,SegmentNumber; byte_test:0,!=,0x0,0; file.field:unset,SegmentNumber;

file.struct; byte_test:SegmentNumber,!=,0x0,0;

Here's where the functionality and representation of how it would function is somewhat tricky.

file.struct:XMLOffset,XMLLength; content:"!ENTITY|20|"; offset:XMLOffset; depth:XMLLength;

This would result in the following (partial) signature, using snippets from above that I have a preference towards but may not be the best way of implementing such a feature:

file.magic:0,dmg,relative; file.struct:SegmentNumber; byte_test:0,!=,0x0,0; file.struct:XMLOffset,XMLLength; content:"!ENTITY|20|"; fast_pattern; offset:XMLOffset; depth:XMLLength;

Use Cases

Use cases here are plenty. Any time magic bytes of a filetype are used, they can now simply be replaced with file.magic:0,<filetype>;. The additional features are helpful in many scenarios, primarily exploit detection but also serve a purpose as great hunting rules for 'odd' files flying over the wire.

Happy to clarify any of the above, elaborate on use cases, or answer questions in general on this ask.

Related issues 1 (1 open0 closed)

Related to Suricata - Task #5893: tracking: deep file awareness and inspectionAssignedVictor JulienActions
Actions #1

Updated by Victor Julien about 1 year ago

  • Related to Task #5893: tracking: deep file awareness and inspection added
Actions #2

Updated by Victor Julien about 1 year ago

  • Assignee changed from OISF Dev to Community Ticket

I think these are good ideas. One major complication is that we currently have no good file classification capability. file.magic is not performant and not cross platform, which is why we see it really only being used in output or local rulesets.

I've linked this to a new tracking ticket #5893. In the short term we have no plans to do work in this direction, so if this is something you'd want to speed up the shortest route is to contribute it. I do feel it will be in scope, so happy to take contributions to take suri in this direction.


Also available in: Atom PDF