For suricata-3.0RC1 file extraction, when we download file by using the chrome browser, the aim file was truncated into several file.x and file.x.meta We suspect: (1) suricata-3.0RC1 cannot support the file extraction for file which is downloaded by the multi-thread, so the aim file was truncated into several file.x and file.x.meta. How do we merge these splitted files?
(2) suricata-3.0RC1 cannot support the file extraction for file which is downloaded by the breakpoint resume, so the aim file was truncated into several file.x and file.x.meta. How do we merge these splitted files?
Thank you so much for your generous helps to the beginner!
Not sure what's the aim file is(is that the pdf file?), but I do see data in the pcaps for (2). For http, it can be (1) or (2), which doesn't matter, since it ends up being pretty much the same in the end, since the file is anyways downloaded in chunks(for (1) I presume you meant multiple flows for multi-threads).
You files consumer has to parse the files and rearrange them.
I think I encounter to the similar problem on 3.0rc1 (github version). When using browser (no matter what browser is) to download a large size file (from about 100Mb to over 1Gb), the download will timeoff and the download will fail in final. However, when using wget to download, even it encounters timeout, it can download with more tries and the download is completed.
I am running Suricata in af_packet mode and md5 as well as filestore are applied. I think the problem may be libhtp. This problem also happened in 2.1dev without md5 and filestore too. I have tested the problem many times in different network with the same problem.
Subject changed from For suricata-3.0RC1 file extraction, when we download file by using the chrome browser, the aim file was truncated into several file.x and file.x.meta. Maybe multi-thread download or breakpoint resume? to 3.0RC1 file extraction
Priority changed from High to Normal
hao chen - When testing file extraction you should always test with "wget" or similar - otherwise browser cache comes into play and can affect the extraction.
As with regards to the target pdf file that you are trying to extract: What is the name and MD5 sum of the file that you are trying to extract? (so I can make the relevant test).
The file utility on Linux does not recognize the file as pdf either - so most likely there is some problem with the completeness of the pcap and/or traffic:
root@LTS-64-1:~/Tests/bug-1609/log # file files/file.2
files/file.2: data
root@LTS-64-1:~/Tests/bug-1609/log #