Bug #2355
closedMissing events with PF_RING 7.1.0
Description
After building Suricata (4.0.0 and later, didn't try 3.x) with PF_RING 7.1.0, EVE output does no longer contain app-level events, only flows, some alerts (only those that do not require app-layer decoding) and stats. The stats.log confirms that apparently app-layer decoding is limited:
AF_PACKET:
----------------------------------------------------------------------------------- Date: 12/11/2017 -- 13:52:26 (uptime: 0d, 00h 06m 04s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 686336033 capture.kernel_drops | Total | 234516595 decoder.pkts | Total | 452024541 decoder.bytes | Total | 326598481839 decoder.invalid | Total | 752032 decoder.ipv4 | Total | 450564379 decoder.ipv6 | Total | 13465 decoder.ethernet | Total | 465278010 decoder.tcp | Total | 413616530 decoder.udp | Total | 26575273 decoder.icmpv4 | Total | 8305400 decoder.icmpv6 | Total | 50 decoder.gre | Total | 275 decoder.vlan | Total | 422557560 decoder.teredo | Total | 12613 decoder.avg_pkt_size | Total | 722 decoder.max_pkt_size | Total | 1514 flow.tcp | Total | 2239259 flow.udp | Total | 1106157 flow.icmpv6 | Total | 3 defrag.ipv4.fragments | Total | 10385 defrag.ipv4.reassembled | Total | 4960 decoder.icmpv4.ipv4_unknown_ver | Total | 9 decoder.tcp.invalid_optlen | Total | 8 decoder.vlan.unknown_type | Total | 752015 tcp.sessions | Total | 1552824 tcp.pseudo | Total | 36994 tcp.syn | Total | 4537984 tcp.synack | Total | 3343138 tcp.rst | Total | 2687129 tcp.stream_depth_reached | Total | 3222 tcp.reassembly_gap | Total | 22404 tcp.overlap | Total | 148635 tcp.insert_list_fail | Total | 130 detect.alert | Total | 45 app_layer.flow.http | Total | 17014 app_layer.tx.http | Total | 33506 app_layer.flow.smtp | Total | 1195 app_layer.tx.smtp | Total | 1455 app_layer.flow.tls | Total | 111789 app_layer.flow.ssh | Total | 10 app_layer.flow.smb | Total | 551 app_layer.flow.dcerpc_tcp | Total | 554 app_layer.flow.dns_tcp | Total | 39 app_layer.tx.dns_tcp | Total | 57 app_layer.flow.enip_tcp | Total | 2 app_layer.tx.enip_tcp | Total | 4313 app_layer.flow.failed_tcp | Total | 25526 app_layer.flow.dns_udp | Total | 468444 app_layer.tx.dns_udp | Total | 1361992 app_layer.flow.enip_udp | Total | 189 app_layer.tx.enip_udp | Total | 194 app_layer.flow.failed_udp | Total | 637524 flow_mgr.closed_pruned | Total | 443247 flow_mgr.new_pruned | Total | 1945025 flow_mgr.est_pruned | Total | 13 flow.spare | Total | 8800 flow.tcp_reuse | Total | 429273 flow_mgr.flows_checked | Total | 409914 flow_mgr.flows_notimeout | Total | 402764 flow_mgr.flows_timeout | Total | 7150 flow_mgr.flows_timeout_inuse | Total | 247 flow_mgr.flows_removed | Total | 6903 flow_mgr.rows_checked | Total | 65536 flow_mgr.rows_skipped | Total | 39210 flow_mgr.rows_maxlen | Total | 39 tcp.memuse | Total | 164095088 tcp.reassembly_memuse | Total | 1385919392 dns.memuse | Total | 61260823 http.memuse | Total | 1740109925 flow.memuse | Total | 285200512
PF_RING:
------------------------------------------------------------------------------------ Date: 12/11/2017 -- 14:30:24 (uptime: 0d, 00h 05m 10s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 501027458 capture.kernel_drops | Total | 70551514 decoder.pkts | Total | 502718889 decoder.bytes | Total | 385563135748 decoder.invalid | Total | 1228617 decoder.ipv4 | Total | 501258422 decoder.ipv6 | Total | 11947 decoder.ethernet | Total | 517635913 decoder.tcp | Total | 464279594 decoder.udp | Total | 28373641 decoder.icmpv4 | Total | 7055425 decoder.icmpv6 | Total | 40 decoder.gre | Total | 202 decoder.vlan | Total | 502718889 decoder.vlan_qinq | Total | 327549685 decoder.teredo | Total | 11092 decoder.avg_pkt_size | Total | 766 decoder.max_pkt_size | Total | 1518 flow.tcp | Total | 914059 flow.udp | Total | 3199030 flow.icmpv6 | Total | 2 defrag.ipv4.fragments | Total | 7740 defrag.ipv4.reassembled | Total | 3703 decoder.icmpv4.ipv4_unknown_ver | Total | 6 decoder.tcp.invalid_optlen | Total | 6 decoder.vlan.unknown_type | Total | 1228605 tcp.sessions | Total | 535445 tcp.syn | Total | 3695043 tcp.synack | Total | 2811274 tcp.rst | Total | 2366381 detect.alert | Total | 34 app_layer.flow.dns_udp | Total | 1373463 app_layer.tx.dns_udp | Total | 1490375 app_layer.flow.enip_udp | Total | 170 app_layer.tx.enip_udp | Total | 197 app_layer.flow.failed_udp | Total | 1825397 flow_mgr.new_pruned | Total | 2855072 flow.spare | Total | 13371 flow_mgr.flows_checked | Total | 737245 flow_mgr.flows_notimeout | Total | 728821 flow_mgr.flows_timeout | Total | 8424 flow_mgr.flows_removed | Total | 8424 flow_mgr.rows_checked | Total | 65536 flow_mgr.rows_skipped | Total | 28340 flow_mgr.rows_maxlen | Total | 44 tcp.memuse | Total | 148958880 tcp.reassembly_memuse | Total | 3276800 dns.memuse | Total | 42790886 flow.memuse | Total | 371814208
It's interesting that the remanining flows always have no content towards the client and are tagged "state":"new","reason":"timeout". Here's an anonymized sample with all src IPs set to 1.1.1.1 and all dest IPs set to 2.2.2.2:
{"timestamp":"2017-12-11T16:55:15.342431+0000","flow_id":844118785026378,"event_type":"flow","src_ip":"1.1.1.1","src_port":34177,"dest_ip":"2.2.2.2","dest_port":161,"proto":"UDP","app_proto":"failed","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":95,"bytes_toclient":0,"start":"2017-12-11T16:54:44.223562+0000","end":"2017-12-11T16:54:44.223562+0000","age":0,"state":"new","reason":"timeout","alerted":false}} {"timestamp":"2017-12-11T16:55:15.342480+0000","flow_id":984877748197184,"event_type":"flow","src_ip":"1.1.1.1","src_port":161,"dest_ip":"2.2.2.2","dest_port":54333,"proto":"UDP","app_proto":"failed","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":101,"bytes_toclient":0,"start":"2017-12-11T16:54:44.137024+0000","end":"2017-12-11T16:54:44.137024+0000","age":0,"state":"new","reason":"timeout","alerted":false}} {"timestamp":"2017-12-11T16:55:15.342518+0000","flow_id":140560192272402,"event_type":"flow","src_ip":"1.1.1.1","src_port":161,"dest_ip":"2.2.2.2","dest_port":47842,"proto":"UDP","app_proto":"failed","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":93,"bytes_toclient":0,"start":"2017-12-11T16:54:44.161810+0000","end":"2017-12-11T16:54:44.161810+0000","age":0,"state":"new","reason":"timeout","alerted":false}} {"timestamp":"2017-12-11T16:55:15.342564+0000","flow_id":2110902209124558,"event_type":"flow","src_ip":"1.1.1.1","src_port":44909,"dest_ip":"2.2.2.2","dest_port":53,"proto":"UDP","app_proto":"dns","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":82,"bytes_toclient":0,"start":"2017-12-11T16:54:44.694478+0000","end":"2017-12-11T16:54:44.694478+0000","age":0,"state":"new","reason":"timeout","alerted":false}} {"timestamp":"2017-12-11T16:55:15.342610+0000","flow_id":703561685279778,"event_type":"flow","src_ip":"1.1.1.1","src_port":5273,"dest_ip":"2.2.2.2","dest_port":53,"proto":"UDP","app_proto":"dns","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":79,"bytes_toclient":0,"start":"2017-12-11T16:54:44.926754+0000","end":"2017-12-11T16:54:44.926754+0000","age":0,"state":"new","reason":"timeout","alerted":false}} {"timestamp":"2017-12-11T16:55:15.342644+0000","flow_id":1970211965416417,"event_type":"flow","src_ip":"1.1.1.1","src_port":53,"dest_ip":"2.2.2.2","dest_port":56298,"proto":"UDP","app_proto":"failed","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":311,"bytes_toclient":0,"start":"2017-12-11T16:54:44.177121+0000","end":"2017-12-11T16:54:44.177121+0000","age":0,"state":"new","reason":"timeout","alerted":false}}
I have prepared a VM containing a build to reproduce the issue. SSH (port 3022, user:user) into the box available from
https://steinbiss.name/suri/Debian-PFRING.ova -- there's a setup there that should show the problem (see README.txt in /home/user).
Files
Updated by Victor Julien almost 7 years ago
Looks like something changed around vlan handling. The PF_RING stats have a lot vlan_qinq, in other words 2 layers of vlan. Is that actually part of the monitored network?
Updated by Sascha Steinbiss almost 7 years ago
Victor Julien wrote:
Looks like something changed around vlan handling. The PF_RING stats have a lot vlan_qinq, in other words 2 layers of vlan. Is that actually part of the monitored network?
Probably not -- this may just have been a result of the tcpreplay situation on the test setup the initial two stats sets were taken from. For example, we have a second affected sensor which does not show this situation with PF_RING 7:
------------------------------------------------------------------------------------ Date: 12/12/2017 -- 13:32:17 (uptime: 0d, 04h 23m 02s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 5447912456 capture.kernel_drops | Total | 27609145 decoder.pkts | Total | 5448255815 decoder.bytes | Total | 4116807472187 decoder.ipv4 | Total | 5550847406 decoder.ipv6 | Total | 10494 decoder.ethernet | Total | 5448255815 decoder.tcp | Total | 4880628116 decoder.udp | Total | 148396011 decoder.icmpv4 | Total | 2302091 decoder.vlan | Total | 5448255815 decoder.teredo | Total | 10494 decoder.avg_pkt_size | Total | 755 decoder.max_pkt_size | Total | 1514 flow.tcp | Total | 37809373 flow.udp | Total | 12849534 defrag.ipv4.fragments | Total | 205202629 defrag.ipv4.reassembled | Total | 102599273 tcp.sessions | Total | 15164807 tcp.syn | Total | 15280775 tcp.synack | Total | 15462895 tcp.rst | Total | 8529129 detect.alert | Total | 6 app_layer.flow.dcerpc_udp | Total | 2773 app_layer.flow.dns_udp | Total | 3460986 app_layer.tx.dns_udp | Total | 3468702 app_layer.flow.enip_udp | Total | 18 app_layer.tx.enip_udp | Total | 644 app_layer.flow.failed_udp | Total | 9385757 flow_mgr.new_pruned | Total | 50462527 flow.spare | Total | 10403 flow_mgr.flows_checked | Total | 20749 flow_mgr.flows_notimeout | Total | 18278 flow_mgr.flows_timeout | Total | 2471 flow_mgr.flows_removed | Total | 2471 flow_mgr.rows_checked | Total | 165536 flow_mgr.rows_skipped | Total | 154467 flow_mgr.rows_empty | Total | 834 flow_mgr.rows_maxlen | Total | 7 tcp.memuse | Total | 33779760 tcp.reassembly_memuse | Total | 2621440 dns.memuse | Total | 1573767 flow.memuse | Total | 70014464
FYI: the problem is also present with PF_RING 7.0.0, not only 7.1.0.
Updated by Andreas Herz almost 7 years ago
- Assignee set to Anonymous
- Target version set to TBD
Updated by Victor Julien almost 7 years ago
- Assignee changed from Anonymous to OISF Dev
Updated by Victor Julien almost 7 years ago
- Target version changed from TBD to 70
Updated by Victor Julien almost 7 years ago
I can confirm the issue. I see it with PF_RING 6.6.0 as well. It seems that if you comment out https://github.com/OISF/suricata/blob/master/src/source-pfring.c#L262 it works.
I've sent a report to Alfredo to see if this is a bug in our code or in PF_RING.
Updated by Victor Julien almost 7 years ago
Alfredo told me that PR_RING does not strip the vlan header (in fact if the hardware removes it PF_RING puts it back). So we have to choose between relying on the PF_RING parsed info in the extended hdr or simply parse it ourselves.
This appears to be different from before, as Eric wrote in 1fa4233d67d068fd49155e9d153011a491125833: "PF_RING is delivering the packet with VLAN header stripped. This patch updates the code to get the information from PF_RING extended header information.".
Also, looking at https://github.com/ntop/PF_RING/commit/15d60aa5b4002abacd5f7580f8c8b8888ac10e27, it's also suggested that the vlan header may not be present.
Updated by Victor Julien almost 7 years ago
- Status changed from New to Feedback
- Assignee changed from OISF Dev to Victor Julien
- Target version changed from 70 to 4.1beta1
Should be fixed by https://github.com/OISF/suricata/pull/3107
Please test!
Updated by Sascha Steinbiss almost 7 years ago
Please test!
I just tested the latest master in my VM with PF_RING 6.6.0 and 7.1.0, after confirming that the pfring patches were merged.
Apparently your changes, however, did not fix my original problem -- with 7.0.0 and 7.1.0 the same behaviour was observed as before (no app layer decoding) while on 6.6.0 it works. Interestingly I never encountered encounter any problem with 6.6.0 even before your changes, it just worked while 7.x didn't.
Is there anything else I could do to provide additional information? I will test on a non-virtualized sensor tomorrow as well but just wanted to let you know.
Updated by Peter Manev almost 7 years ago
Interesting - in my tests with the PR ( https://github.com/OISF/suricata/pull/3105#issuecomment-352860540 ) it seemed good.
Can you have a look at my set up there and see if anything is different ? (I used the latest pfring from git as well)
Updated by Victor Julien almost 7 years ago
Could you try disabling vlan handling as a test? '--set vlan.use-for-tracking=false' on the cmdline.
The packet numbers between afp and pfr seem more or less the same.
Updated by Sascha Steinbiss almost 7 years ago
Victor Julien wrote:
Could you try disabling vlan handling as a test? '--set vlan.use-for-tracking=false' on the cmdline.
That indeed makes it work, yes.
Updated by Sascha Steinbiss almost 7 years ago
Peter Manev wrote:
Interesting - in my tests with the PR ( https://github.com/OISF/suricata/pull/3105#issuecomment-352860540 ) it seemed good.
Can you have a look at my set up there and see if anything is different ?
Not really. What I did notice in my VM though is that with Victor's patches it does not work when I use tcpreplay on an extra virtual 'monitoring' interface, but when I let suri run on the main VM interface and do some HTTP requests, they will show up.
Unfortunately I can't reproduce this on a physical test sensor. Even with the latest patches I doesn't seem to work either on separate monitoring interfaces (which get traffic via tcpreplay as well but from a separate machine) or via manual traffic on the main interface.
Updated by Peter Manev almost 7 years ago
In my set up it was live traffic.
Just so I understand correctly -
On a Vm on the main interface it works only when you do it on the "main" interface and manually test it out to confirm.
On a HW server - it does not (even with "--set vlan.use-for-tracking=false")
?
What is your tcpreplay command?
Updated by Sascha Steinbiss almost 7 years ago
Hi Peter,
thanks for your offer to help.
Just so I understand correctly -
On a Vm on the main interface it works only when you do it on the "main" interface and manually test it out to confirm.
On the VM, I tried the patched Suricata version with PF_RING 7.0.0 configured to use enp0s8 as the monitoring interface, and suricata configured to use this interface via PF_RING for capture. This resulted in no app-layer events when using tcpreplaying some pcaps with HTTP and DNS requests(see below for exact command) into that interface -- FYI these worked with PF_RING 6.6.0. Then I reconfigured PF_RING and suricata to use enp0s3 (NATted to the host machine) and manually did some HTTP requests via wget, which resulted in proper HTTP EVE events.
On a HW server - it does not (even with "--set vlan.use-for-tracking=false")
The hardware sensor also gets traffic via tcpreplay, but via a direct connection between the NICs of the sensor and a traffic generator machine, which also uses tcpreplay to send a loop of recorded traffic to the server. With this setup, the only thing that helped getting app-laver events i the output was setting "--set vlan.use-for-tracking=false". So this did indeed work. However, on this machine just doing manual requests on the management interface after reconfiguring the machine to capture from there didn't work out, I could not get any app-layer events.
Note that I made sure to also capture outgoing traffic (RX+TX) in PF_RING. You can play around with the setup in the VM for yourself if you want. (The version linked above does not have the patched git version of Suricata yet).
What is your tcpreplay command?
That would be:
sudo tcpreplay -x 5 -i enp0s8 smallFlows.pcap http.cap dns.cap
where the pcaps are from public sources (and also in the VM). I also tried -t for full line speed transmission.
I guess the best way to check 'for real' would be to try it on a low risk sensor that sees live traffic, but most likely, for us, that would have to wait until after the holidays.
Thanks again and please let me know if I can help in any other way!
Sascha
Updated by Sascha Steinbiss almost 7 years ago
I guess the best way to check 'for real' would be to try it on a low risk sensor that sees live traffic, but most likely, for us, that would have to wait until after the holidays.
I just tried that today with the latest PF_RING 7.0.0 from their apt repo, with and without the patched Suricata version, but no luck. The only thing that makes this setup work is the workaround using vlan.use-for-tracking=false.
Is there anything I can do to help narrow down the issue or to reproduce it?
Updated by Peter Manev over 6 years ago
Sacha - just checking in - if that is still an issue? (also dont forget to double check(or test out) the vlan NIC offloading settings ...just in case)
Updated by Sascha Steinbiss over 6 years ago
Hi Peter, thanks for checking in. It's not an urgent issue anymore, as we took this opportunity to switch our stack back to AF_PACKET. This means I'd have to further look at this in the VM I prepared. But we haven't found a solution to this problem yet.
As far as VLAN offloading is concerned, this is not something that is handled by Suricata at runtime? I was under the impression that newer Suri versions will disable offloading as far as possible on startup.
Anyway, I will take a look if that changes anything. Not even sure that makes any difference in the VM or whether this is even supported by the emulated NIC's driver.
Updated by Sascha Steinbiss over 6 years ago
I just tried this after disabling VLAN offloading in the VM:
user@debian:~$ sudo ethtool -K enp0s8 rxvlan off
Actual changes:
rx-vlan-offload: off
tx-vlan-offload: off [fixed]
user@debian:~$ sudo ethtool -k enp0s8 | grep vlan
rx-vlan-offload: off
tx-vlan-offload: off [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
but still no changes. Still flows time out as they did before.
Updated by Sascha Steinbiss over 6 years ago
FYI, I also updated Suricata to the latest git master (19988310d1dfe941be7fd9231a64c98aba7391d2) and PF_RING to the latest version on ntop's apt archive (7.1.0 (dev:ce9f06dbcbf8c5d9bea52fb3525f948e33d8a5a7)). No change.
Updated by Sascha Steinbiss over 6 years ago
finally, here are the configs for my test setup:
- http://paste.debian.net/hidden/676b35d0/ (startup output of
sudo suricata/src/suricata --pfring -c suricata/suricata.yaml -l suricata -vvv
) - http://paste.debian.net/hidden/985e41eb/ (suricata.yaml)
I compiled with
./configure --enable-af-packet --enable-nfqueue --enable-prelude --enable-nflog --enable-gccprotect --disable-gccmarch-native --disable-coccinelle --enable-geoip --enable-hiredis --enable-pfring --enable-libhs --enable-luajit
Updated by Victor Julien over 6 years ago
- Target version changed from 4.1beta1 to 70
Updated by Peter Manev over 6 years ago
Sorry for the late replay -
Is it possible to
1 - while doing those tests - tcpdump on the interface that Suricata is sniffing - from the same shell/vm you start it.
2 - share that pcap privately if you would like?
Thanks
Updated by Konstantin Klinger over 6 years ago
The same problem occurs with AF_PACKET on some sensors with version 4.0.4. If we disable vlan tracking (vlan.use-for-tracking=false) there are app-layer events, otherwise not. Do you think the Problem will be solved with version 4.1?
Updated by Konstantin Klinger over 6 years ago
I tried it with the latest suricata-master git, and the problem is still there. I have to set vlan.use-for-tracking=false, if i want to have any app-layer events.
Updated by Sascha Steinbiss over 6 years ago
- File eve.json eve.json added
- File example-traffic.pcap example-traffic.pcap added
Hi Peter,
coming back to the original issue. I've retried with the latest PF_RING (7.1.0 ($Revision: dev:e59670fbbdd20341e25ceec59b99c9c9d7d375af$)) and Suricata version (git master 325f336f637f).
Is it possible to
1 - while doing those tests - tcpdump on the interface that Suricata is sniffing - from the same shell/vm you start it.
Sure, did that, on the same VM and interface.
2 - share that pcap privately if you would like?
Sure, see attached eve.json with timed out flows and the corresponding pcap (sharing publicly as there's nothing private/confidential in there). Ignore the SSH stuff in there, that's me connecting to the VM. The example traffic I expected to see is a https request to de.indeed.com.
Updated by Peter Manev over 6 years ago
Is it possible to do a run (replay) with "vlan.use-for-tracking=false" and another (same run) but with "vlan.use-for-tracking=true" - and share the two full (respectively for each run ) stats/eve.json logs?
Updated by Victor Julien over 6 years ago
- Assignee changed from Victor Julien to OISF Dev
Updated by Andreas Herz almost 6 years ago
Did you try the suggestions by peter?
Updated by Andreas Herz almost 5 years ago
- Status changed from Feedback to Closed
Updated by Victor Julien almost 5 years ago
- Assignee deleted (
OISF Dev) - Target version deleted (
TBD)