Project

General

Profile

Actions

Support #3320

closed

Signficant packet loss when using Suricata with Rust enabled

Added by Eric Urban about 5 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Affected Versions:
Label:

Description

Summary
We experience significant packet loss at times with Rust enabled in Suricata. In our environment we have two instances with the same version, same configuration file, same rules loaded, and same traffic where the one without Rust enabled has little to no packet loss and the one with Rust experienced packet loss. Disabling Rust on the host with packet loss has been shown to correct the issue.

Details

Currently we are running two instances on 4.1.5 side by side with the same configuration, rules loaded, and traffic. In both cases Suricata was complied with the options "HAVE_PYTHON=/usr/bin/python3 ./configure --with-libpcap=/opt/snf --localstatedir=/var/ --with-libhs-includes=/usr/local/include/hs/ --with-libhs-libraries=/usr/local/lib64/" but one had Rust/Cargo present during compliation and the other didn't. We also have a 5.0.0 instance, where Rust is required and enabled by default, with the same config/rules/traffic that experiences drops as well. This same behavior was also seen on 4.1.2 where we did a side by side compare of using Rust vs. not using it.

Our current comparison setup unfortunately is being done on hosts with different hardware. However, we did run this comparison on identical hardware back when using 4.1.2 and had the same results where Rust being enabled produced many more drops. I also believe in our current test setup that both hosts are more than adequately sized. The Rust enabled host has 40 cores with 128GB memory and 1 instance of Suricata. The non-Rust host has 88 cores with 256GB memory and 4 instances of Suricata, though only one of four instances is getting the traffic mirroring that of our Rust enabled instance.

The Suricata stats show drops and so do our Myricom stats. It appears there could be a counter issue of some kind because the number of packets during these periods of large drops also increases significantly. When I compared packets received minus packets dropped across these two hosts, the Rust enabled instance still had noticeably fewer total packets in most cases, so it would seem something else is going on. One example difference of the sum of stats.capture.kernel_packets_delta and stats.capture.kernel_drops_delta on Nov 4 over the minute of 11:06 is the Rust instance had 1,440,535 packets vs. 8,237,600 without Rust.

During the periods of drops, the Rust enabled instance has fewer alerts. The difference varies quite a bit depending on time period analyzed and which period of drops is analyzed. One example is between 09:00 and 10:00 on November 4 when drops were happening that the Rust instance had 13601 alerts and the one without Rust had 15820. When looking at times outside of drop periods, for the times I sampled, the Rust host generally has slightly more alerts but this is around 1% or less. I am guessing this small difference during normal operating periods isn't too unusual since enabling Rust does change the traffic analyzers for some protocols.

I did seek the help through the mailing list earlier this year at a thread started with https://lists.openinfosecfoundation.org/pipermail/oisf-users/2019-February/016618.html. That had some activity over at least a few months, but there was no resolution and the thread became quite long so it may be best to avoid looking at that and to start from scratch.

Some additional info that applies to both 4.1.5 instances:
- CentOS Linux release 7.7.1908 (Core) / 3.10.0-1062.1.2.el7.x86_64 #1 SMP Mon Sep 30 14:19:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Pcap capture method (using --pcap command-line option) with workers runmode
- Myricom cards.
ProductCode Driver Version
10G-PCIE2-8C2-2S myri_snf 3.0.18.50878
- Rust/cargo versions:
Rust compiler: rustc 1.38.0
Rust cargo: cargo 1.38.0

I will attach stats from the eve logs for both hosts and also Myricom stats logs. Note that the counters ending in __per_second in the Myricom log should not be used as these are not standard. Build-info output is also included. I can provide configuration directly (not through Redmine) if requested.

Steps to reproduce
Unknown for sure how to reproduce other than building Suricata with Rust.


Files

rust_eveMyricomStats.tar.gz (7.03 MB) rust_eveMyricomStats.tar.gz Stats and build-info from Rust enabled host Eric Urban, 11/05/2019 08:57 PM
noRust_eveMyricomStats.tar.gz (6.8 MB) noRust_eveMyricomStats.tar.gz Stats and build-info from host with Rust not enabled Eric Urban, 11/05/2019 08:57 PM
Actions

Also available in: Atom PDF