Do not edit. This page has been migrated. See Converting_Wiki_Documentation_to_Sphinx.

File Extraction

Starting with Suricata version 1.2 it's possible to extract files from HTTP sessions as well as match on file name, extension and "magic".


The file extraction code works on top of the HTTP parser which itself is largely a wrapper for libhtp. The HTTP parser takes care of dechunking and unzipping the request and/or response data if necessary. The HTTP parser runs on top of the stream reassembly engine.

This means that settings in the stream engine, reassembly engine and the HTTP parser all affect the workings of the file extraction.

What files are actually extracted and stored to disk is controlled by the rule language.


stream.checksum_validation controls whether or not the stream engine rejects packets with invalid checksums. A good idea normally, but the network interface performs checksum offloading a lot of packets may seem to be broken. This setting is enabled by default, and can be disabled by setting to "no". Note that the checksum handling can be controlled per interface, see "checksum_checks" in example configuration.

stream.reassembly.depth controls how far into a stream reassembly is done. Beyond this value no reassembly will be done. This means that after this value the HTTP session will no longer be tracked. By default a settings of 1 Megabyte is used. 0 sets it to unlimited.

libhtp.default-config.request-body-limit / libhtp.server-config.<config>.request-body-limit controls how much of the HTTP request body is tracked for inspection by the http_client_body keyword, but also used to limit file inspection. A value of 0 means unlimited.

libhtp.default-config.response-body-limit / libhtp.server-config.<config>.response-body-limit is like the request body limit, only it applies to the HTTP response body.

NIC offloading

NIC offloading should be disabled:

apt-get install ethtool

ethtool -k eth3

Offload parameters for eth3:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off

Everything should be OFF. If it is not here is how you can disable it:

ethtool -K eth3 tso off
ethtool -K eth3 gro off
ethtool -K eth3 lro off
ethtool -K eth3 gso off
ethtool -K eth3 rx off
ethtool -K eth3 tx off
ethtool -K eth3 sg off
ethtool -K eth3 rxvlan off
ethtool -K eth3 txvlan off

NOTICE the difference between small k and a BIG K !!
Please make sure you choose the appropriate interface name (eth0,eth1,eth5...)


For file extraction two separate output modules were created: "file-log" and "file-store". They need to be enabled in the suricata.yaml. For "file-store", the "files" drop dir must be configured.

- file-store:
    enabled: yes      # set to yes to enable
    log-dir: files    # directory to store the files
    force-magic: no   # force logging magic on all stored files
    force-md5: no     # force logging of md5 checksums
    waldo: file.waldo # waldo file to store the file_id across runs

Each file that is stored with have a name "file.<id>". The id will be reset and files will be overwritten unless the waldo option is used.

  - file-log:
      enabled: yes
      filename: files-json.log
      append: yes
      #filetype: regular # 'regular', 'unix_stream' or 'unix_dgram'
      force-magic: no   # force logging magic on all logged files
      force-md5: no     # force logging of md5 checksums


Without rules in place no extraction will happen. The simplest rule would be:

alert http any any -> any any (msg:"FILE store all"; filestore; sid:1; rev:1;)

This will simply store all files to disk.

Want to store all files with a pdf extension?

alert http any any -> any any (msg:"FILE PDF file claimed"; fileext:"pdf"; filestore; sid:2; rev:1;)

Or rather all actual pdf files?

alert http any any -> any any (msg:"FILE pdf detected"; filemagic:"PDF document"; filestore; sid:3; rev:1;)

Bundled with the Suricata download is a file with more example rules. In the archive, go to the rules/ directory and check the files.rules file.


Suricata can calculate MD5 checksums of files on the fly and log them. See MD5 for an explanation on how to enable this.