Bug #705
closedhttp.log missing lots of requests under high traffic load
Description
using -
suricata --build-info [23628] 10/1/2013 -- 00:49:59 - (suricata.c:560) <Info> (SCPrintBuildInfo) -- This is Suricata version 1.4dev (rev 5f4c528) [23628] 10/1/2013 -- 00:49:59 - (suricata.c:633) <Info> (SCPrintBuildInfo) -- Features: PCAP_SET_BUFF LIBPCAP_VERSION_MAJOR=1 PF_RING AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG LIBNET1.1 HAVE_HTP_URI_NORMALIZE_HOOK HAVE_HTP_TX_GET_RESPONSE_HEADERS_RAW HAVE_NSS PROFILING [23628] 10/1/2013 -- 00:49:59 - (suricata.c:647) <Info> (SCPrintBuildInfo) -- 64-bits, Little-endian architecture [23628] 10/1/2013 -- 00:49:59 - (suricata.c:649) <Info> (SCPrintBuildInfo) -- GCC version 4.6.3, C version 199901 [23628] 10/1/2013 -- 00:49:59 - (suricata.c:655) <Info> (SCPrintBuildInfo) -- __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 [23628] 10/1/2013 -- 00:49:59 - (suricata.c:658) <Info> (SCPrintBuildInfo) -- __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 [23628] 10/1/2013 -- 00:49:59 - (suricata.c:661) <Info> (SCPrintBuildInfo) -- __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 [23628] 10/1/2013 -- 00:49:59 - (suricata.c:664) <Info> (SCPrintBuildInfo) -- __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 [23628] 10/1/2013 -- 00:49:59 - (suricata.c:667) <Info> (SCPrintBuildInfo) -- __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 [23628] 10/1/2013 -- 00:49:59 - (suricata.c:671) <Info> (SCPrintBuildInfo) -- compiled with -fstack-protector [23628] 10/1/2013 -- 00:49:59 - (suricata.c:677) <Info> (SCPrintBuildInfo) -- compiled with _FORTIFY_SOURCE=2 [23628] 10/1/2013 -- 00:49:59 - (suricata.c:680) <Info> (SCPrintBuildInfo) -- compiled with libhtp 0.2.11, linked against 0.2.11
root@suricata:/var/data/regit/log/suricata# time tcpdump -i eth3 -n -s 0 -w - | grep -a -o -E "Host\: .*|GET \/.*" >gets.txt tcpdump: WARNING: eth3: no IPv4 address assigned tcpdump: listening on eth3, link-type EN10MB (Ethernet), capture size 65535 bytes ^C637676 packets captured 4955063 packets received by filter 4317303 packets dropped by kernel 1 packet dropped by interface real 0m3.942s user 0m2.824s sys 0m1.420s root@suricata:/var/data/regit/log/suricata# grep GET gets.txt |wc -l 1529
So for 3 seconds we have 1529 HTTP GET requests with tcpdump (and that is just GET)
root@suricata:/var/data/regit/log/suricata# date && wc -l http.log Thu Jan 10 00:14:16 EET 2013 518476 http.log root@suricata:/var/data/regit/log/suricata# date && wc -l http.log Thu Jan 10 00:14:18 EET 2013 518480 http.log root@suricata:/var/data/regit/log/suricata# date && wc -l http.log Thu Jan 10 00:14:19 EET 2013 518489 http.log
and for 3 sec with Suricata we get 13 http logs in http.log on the same 10G/s traffic
root@suricata:/var/data/regit/log/suricata# wc -l gets.txt 3209 gets.txt root@suricata:/var/data/regit/log/suricata# tail -8 gets.txt Host: redirector.c.youtube.com GET /v302708503/6baa/fIDnzBb3MPE.jpg HTTP/1.1 Host: cs302708.userapi.com GET /csi?v=2&s=youtube&action=results&e=901312,916623,922911,919108,920704,912806,922403,922405,929901,913605,929104,929110,920201,913302,919009,911116,926403,901451&li=1&rt=ct.281,js_head.591,js_page.863,js_foot.865,ol.1208,aft.1208 HTTP/1.1 Host: csi.gstatic.com GET /socket.io/1/?t=1357769748997 HTTP/1.1 Host: ws.9gag.com:8084 GET /vi/9d6IsU01-HE/hqdefault.jpg HTTP/1.1 root@suricata:/var/data/regit/log/suricata#
my http config part in suriacta.yaml
# a line based log of HTTP requests (no alerts) - http-log: enabled: yes filename: http.log append: yes extended: yes # enable this for extended logging information #filetype: regular # 'regular', 'unix_stream' or 'unix_dgram'
just for general info - extended: yes/no - it has no effect on the number of logs written.
I think 1.3.1 didn't have that issue.
Updated by Victor Julien almost 12 years ago
Sounds like you're just experiencing packet loss.
Updated by Peter Manev almost 12 years ago
How can we be sure if this is the case?
I mean all these
tcp.stream_depth_reached tcp.reassembly_gap tcp.segment_memcap_drop capture.kernel_drops tcp.ssn_memcap_drop defrag.max_frag_hits tcp.invalid_checksum defrag.ipv6.timeouts defrag.ipv4.timeouts
are zeros (0) in stats.log for every thread (running af_packet).
Updated by Anoop Saldanha almost 12 years ago
The regex you have there is not a definite way to check the http requests from tcpdump.
Not saying that we may not have an issue here. 15 requests for that network sounds suspicious indeed. Possible to show the same behaviour with a pcap?
Updated by Peter Manev almost 12 years ago
no the regex is not :) - but here is one more definitive way (i think) with tshark:
root@suricata:/var/data/peter/http-problem# tshark -r http-problem.pcap -R "http.request" -T fields -e http.request.method |wc -l tshark: Lua: Error during loading: Running as user "root" and group "root". This could be dangerous. 14799 root@suricata:/var/data/peter/http-problem# root@suricata:/var/data/peter/http-problem#
tshark shows 14K requests from the pcap.
root@suricata:/var/data/peter/http-problem# capinfos http-problem.pcap File name: http-problem.pcap File type: Wireshark/tcpdump/... - libpcap File encapsulation: Ethernet Packet size limit: file hdr: 65535 bytes Number of packets: 4798877 File size: 4632192790 bytes Data size: 4555410734 bytes Capture duration: 9 seconds Start time: Thu Jan 10 19:20:38 2013 End time: Thu Jan 10 19:20:47 2013 Data byte rate: 533608406.22 bytes/sec Data bit rate: 4268867249.74 bits/sec Average packet size: 949.27 bytes Average packet rate: 562127.38 packets/sec SHA1: eb62c3788428230bfdb50c6577b1bd61e7d0ebfb RIPEMD160: 9dcb0acabd482aa460ae192af4cc3b45c15f0ae3 MD5: 9bd082aab4a4373f3625ba125ba2e482 Strict time order: False root@suricata:/var/data/peter/http-problem#
running the same pcap with --runmode=single and zero rules
[27313] 10/1/2013 -- 21:23:44 - (source-pcap-file.c:190) <Info> (ReceivePcapFileLoop) -- pcap file end of file reached (pcap err code 0) [27312] 10/1/2013 -- 21:23:44 - (suricata.c:2013) <Info> (main) -- Signal Received. Stopping engine. [27314] 10/1/2013 -- 21:23:44 - (flow-manager.c:554) <Info> (FlowManagerThread) -- 0 new flows, 0 established flows were timed out, 0 flows in closed state [27312] 10/1/2013 -- 21:23:44 - (suricata.c:2049) <Info> (main) -- time elapsed 32.480s [27313] 10/1/2013 -- 21:23:44 - (source-pcap-file.c:290) <Info> (ReceivePcapFileThreadExitStats) -- Pcap-file module read 4798877 packets, 4555410734 bytes [27313] 10/1/2013 -- 21:23:44 - (stream-tcp.c:4180) <Info> (StreamTcpExitPrintStats) -- Stream TCP processed 2191278 TCP packets [27313] 10/1/2013 -- 21:23:44 - (alert-fastlog.c:321) <Info> (AlertFastLogExitPrintStats) -- Fast log output wrote 0 alerts [27313] 10/1/2013 -- 21:23:44 - (log-httplog.c:617) <Info> (LogHttpLogExitPrintStats) -- HTTP logger logged 349 requests [27313] 10/1/2013 -- 21:23:44 - (log-tlslog.c:528) <Info> (LogTlsLogExitPrintStats) -- TLS logger logged 15 requests [27313] 10/1/2013 -- 21:23:44 - (log-file.c:432) <Info> (LogFileLogExitPrintStats) -- (PcapFile) Files logged: 87 [27312] 10/1/2013 -- 21:23:45 - (host.c:244) <Info> (HostPrintStats) -- host memory usage: 349376 bytes, maximum: 16777216 [27312] 10/1/2013 -- 21:23:45 - (util-profiling-rules.c:299) <Info> (SCProfilingRuleDump) -- Dumping profiling data for 0 rules. [27312] 10/1/2013 -- 21:23:45 - (util-profiling-rules.c:416) <Info> (SCProfilingRuleDump) -- Done dumping profiling data. [27312] 10/1/2013 -- 21:23:45 - (detect.c:3942) <Info> (SigAddressCleanupStage1) -- cleaning up signature grouping structure... complete [27312] 10/1/2013 -- 21:23:45 - (util-profiling.c:275) <Info> (SCProfilingDump) -- Done dumping profiling data.
HTTP logger logged 349 requests
Updated by Erik C almost 11 years ago
I see there is a "target version" set TBD. Is there any update on that? I am struggling with this identical issue, and really want to use suri for dedicated http logging (to cover ipv6 http logging), as I already use it for ids. Being able to deploy the same software universally with only conf mod changes between purposes would go a long way towards simplifying my life. Currently I am using a urllog program that was built from libnids, which unfortunately is ipv4 only.
Thank you for any update on this issue.
Updated by Victor Julien almost 11 years ago
I suspect this is either caused by packet loss, or traffic weirdness like async routing or bad checksums.
Updated by Erik C almost 11 years ago
I am running 1.4.6. Below is the only ixgbe config I could use which would get me a full 10gig link packet dump using tcpdump. Without these ethtool mods, it is impossible for us to see all the packets, as evinced by tcpdump and ifconfig packet status. Does this have something to do with why/how we are not getting our urllogs? If so, how can this be managed without us modifying our ixgbe configuration.
----ethtool -G ${i} rx 4096
ethtool -C ${i} rx-usecs 10000
ethtool -C $i adaptive-rx off
ethtool -K $i tso off
ethtool -K $i gro off
ethtool -K $i lro off
ethtool -K $i gso off
ethtool -K $i rx off
ethtool -K $i tx off
ethtool -K $i sg off
- ethtool -i tgig0
driver: ixgbe
version: 3.6.7-k
firmware-version: 0x29b70001
bus-info: 0000:0a:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
root@snots:/home/dpoole# ethtool tgig0
Settings for tgig0:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: No
Advertised link modes: 10000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 10000Mb/s
Duplex: Full
Port: Direct Attach Copper
PHYAD: 0
Transceiver: external
Auto-negotiation: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
ifenslave bond0 $i
done
Updated by Erik C almost 11 years ago
update: Running autonegotiate causes all forms of ids to "not work" based on our traffic. We have to dump autonegotiate in order for us to get a complete picture of what is going across our bond.
Updated by Peter Manev almost 11 years ago
The issue is not caused by packet loss or traffic weirdness. I can reproduce the issue on a repeatable basis(which I just re-did ).
On our high speed test box if i do a tcpdump and run the pcap through Suricata - I get all the logs.
Example:
On the same live traffic box, a 4 sec tcpdump pcap resulting in 4GB pcap , run through Suricata, results in a over 6K http request written in the http.log. While at any given 4-5 sec interval on live traffic the http logs are no more than 200-400.
Updated by Erik C over 10 years ago
Can confirm not related to 1082, which assumably was patched in the 2.0 release which was released on the 28th? (very very very recently at least).
Updated by Eric Leblond over 10 years ago
- Priority changed from Normal to High
- tcpdump + pcap replay in suricata shows 1000 HTTP request/s
- Only 30 HTTP log/s when parsing live traffic
- Only 30 HTTP log/s when parsing "any" subset of live traffic (port 1000:3000 and port 3001:65535)
- 300 HTTP log/s if we write in /run/shm instead of plain hard disk
- 30 HTTP/s Suppressing fflush call after write to fd
- 30 HTTP/s suppressing lock around write (if \n is written then the number of line is correct)
This issue is really strange and Suri is 20 times below reality.
Updated by Victor Julien over 10 years ago
Eric Leblond wrote:
I've done some tests related to this problem on a 10G test box:
- tcpdump + pcap replay in suricata shows 1000 HTTP request/s
If you replay this in suri, does it log all?
Updated by Eric Leblond over 10 years ago
Victor Julien wrote:
Eric Leblond wrote:
I've done some tests related to this problem on a 10G test box:
- tcpdump + pcap replay in suricata shows 1000 HTTP request/s
If you replay this in suri, does it log all?
yes, exactly: tcpdump for 3 sec saving in a pcap, then -r pcap in suri.
Updated by Victor Julien over 10 years ago
Do you happen to have very aggressive flow-timeout settings for tcp? I wonder if we may discard the flow before the http session is complete and ready to be logged.
Updated by Eric Leblond over 10 years ago
Victor Julien wrote:
Do you happen to have very aggressive flow-timeout settings for tcp? I wonder if we may discard the flow before the http session is complete and ready to be logged.
Here's the part of the YAML:
flow-timeouts: default: new: 5 #30 established: 10# 300 closed: 0 emergency-new: 1 #10 emergency-established: 2 #100 emergency-closed: 0 tcp: new: 5 #60 established: 100 # 3600 closed: 10 #30 emergency-new: 1 # 10 emergency-established: 5 # 300 emergency-closed: 20 #20 udp: new: 5 #30 established: 5 # 300 emergency-new: 5 #10 emergency-established: 5 # 100 icmp: new: 5 #30 established: 5 # 300 emergency-new: 5 #10 emergency-established: 5 # 100
Updated by Victor Julien over 10 years ago
Was emergency mode active during the live processing tests?
Updated by Eric Leblond over 10 years ago
Victor Julien wrote:
Was emergency mode active during the live processing tests?
No trace of it.
Updated by Victor Julien over 10 years ago
Did you have stream.midstream enabled or disabled (both live and replay case).
I'm noticing something odd, on the DK box, I get about 10k http logs per minute with midstream disabled, but about 2.5k/min with midstream enabled.
Updated by Peter Manev over 10 years ago
I have tried both (midstream on/off on live traffic) on the 10G test box - no effect on the numbers.
Updated by Victor Julien over 10 years ago
So looking further at this midstream thing, I noticed that in the non-midstream case, after 10mins we have:
tcp.reassembly_memuse | AFPacketeth21 | 955346671
Or about ~900mb
In the midstream case we have:
tcp.reassembly_memuse | AFPacketeth21 | 4294967261
Or 4gb, which is also the limit on this box.
I think there is an issue with 'ageing out' segments in the session in the midstream case. This then causes segments to not be freed.
This case could easily lead to missing http requests, as the tcp reassembly fails more often (memcap reached) and thus reassembly doesn't feed data to the app layer parsers.
Updated by Erik C over 10 years ago
Any word on this? Am anxious to try and get this up and functioning.
Updated by john kely almost 10 years ago
I tested on traffic 500 Mbps. Result:
midstream = false -> HTTP Request is full
midstream = true -> HTTP Request is miss 5%
This is complicated.
Updated by Andreas Herz over 8 years ago
- Assignee changed from Anoop Saldanha to OISF Dev
Updated by Andreas Herz over 5 years ago
It's rather hard to try to reproduce it again with 5.0 beta :)?
Or does any of you still have issues like that?
Updated by Peter Manev over 5 years ago
I would not be interested in http.log anymore :) but rather the "event_type: http" in eve.json.
Having a look at it now - after all the changes and improvements have been made to the latest Suricata code since the ticket was opened :) this very well may be related to either drops or wrong_threads counters and similar issues related to config and set up.
I am in favor of closing this till we actually have a reproducible case.