Project

General

Profile

Bug #4096

flow manager: 200% CPU in KVM host with no activity with Suricata 6

Added by Andrew Welham 8 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Affected Versions:
Effort:
Difficulty:
Label:
Needs backport to 6.0

Description

An ubuntu 20.04 server was upgraded using the suricata PPA. CPU utilisation with activity

suricata:
Installed: 6.0.0-0ubuntu2
Candidate: 6.0.0-0ubuntu2
Version table: *** 6.0.0-0ubuntu2 500
500 http://ppa.launchpad.net/oisf/suricata-stable/ubuntu focal/main amd64 Packages
100 /var/lib/dpkg/status


Files

KVM Client HTOP.jpg (153 KB) KVM Client HTOP.jpg Andrew Welham, 11/03/2020 07:45 AM
KVM Host HTOP.jpg (228 KB) KVM Host HTOP.jpg Andrew Welham, 11/03/2020 07:45 AM
stats.log (10.2 KB) stats.log Andrew Welham, 11/03/2020 07:45 AM
suricata.log (3.54 KB) suricata.log Andrew Welham, 11/03/2020 07:45 AM
Suricata 5 GW.Client.jpg (379 KB) Suricata 5 GW.Client.jpg Andrew Welham, 11/03/2020 09:16 PM
Surocata 5 KVM host.jpg (190 KB) Surocata 5 KVM host.jpg Andrew Welham, 11/03/2020 09:20 PM
kvm-VM-config.xml (8.9 KB) kvm-VM-config.xml KVM vm config Andrew Welham, 12/14/2020 11:28 AM
VirtualBox.xml (3.4 KB) VirtualBox.xml virtual VM config Andrew Welham, 12/14/2020 11:28 AM

Related issues

Copied to Bug #4314: flow manager: 200% CPU in KVM host with no activity with Suricata 6ClosedVictor JulienActions
#1

Updated by Andrew Welham 8 months ago

I've just noticed another server , same OS build different h/w architecture one AMD one Intel.
restored to a backup of
apt-cache policy suricata
suricata:
Installed: 5.0.3-0ubuntu0
Candidate: 5.0.3-0ubuntu0
Version table: *** 5.0.3-0ubuntu0 500
500 http://ppa.launchpad.net/oisf/suricata-stable/ubuntu focal/main amd64 Packages
100 /var/lib/dpkg/status

and put apt upgraded on hold for suricata
I don't have any other information nothing special in the standards logs, happy to provide more info if required, and you let me know what your looking for

#2

Updated by Andrew Welham 8 months ago

Just noticed a few typos in the subject it should read
Ubuntu 20.04 PPA upgraded to Suricata 6 now @ 200% CPU with no activity

#3

Updated by Andrew Welham 8 months ago

Not sure if this helps, suricata is hosted on KVM virtual machines, and the virtual hosts does not look to have high CPU, just the KVM host. The CPU on the KVM host drops dramatically when suricata is stopped.

If i upgrade from v5 to v6 via apt as soon as suricata is restarted on v6 after the upgrade the CPU on the kvm host jumps.

#4

Updated by Peter Manev 8 months ago

Do you mind sharing a screenshot of htop when 6.0 is running and also suricata.log/stats.log files ?

#5

Updated by Peter Manev 8 months ago

  • Subject changed from Ubuntu 20.04 PPS upgraded to Suricata 6 now @ 2000% CPU with no activity to Ubuntu 20.04 PPS upgraded to Suricata 6 now @ 200% CPU with no activity

Updated the subject as requested.

#6

Updated by Andrew Welham 8 months ago

Peter Here you go
I did a Fresh install of Ubuntu 20.04 (on a test VM) and it was fully patched

installed suricata

sudo apt-get install software-properties-common
sudo add-apt-repository ppa:oisf/suricata-stable
sudo apt-get update
sudo apt-get install suricata

To start suricata

/usr/bin/suricata -c /etc/suricata/suricata.yaml -q 0 -D

as soon as suricata was started the CPu on the kvm host changed, when suricata was killed he CPU went down again, then when restarted it went back up again.

I've attached 2 Htop images
KVM Host HTOP.jpg which is the host used to the the test VM on
KVM Client HTOP.jpt this is the VM running suricata.

Its not a false reading on top or HTOP as the heat output from the hosts increases when suricata is running

No traffic was passed though suricata

#7

Updated by Victor Julien 8 months ago

  • Subject changed from Ubuntu 20.04 PPS upgraded to Suricata 6 now @ 200% CPU with no activity to Ubuntu 20.04 PPA upgraded to Suricata 6 now @ 200% CPU with no activity
#8

Updated by Peter Manev 8 months ago

Just confirming - no traffic and in IPS mode no rules - the KVM host gets busy but not the client, correct?
Does it behave the same in just regular/af-packet/ids mode?

#9

Updated by Andrew Welham 8 months ago

I don't run suricata in IDS mode, however the screen shots you have seen are on an unconfigured suricata. I just started with the same commands used on the production system. The result though is the same as my production systems.
I wanted to create a repeatable test for the devs. I installed exactly as shown above, no configuration or rules at all.

Happy to start with a different set of commands if you let me the startup commands you want.

As for the CPU utilisation the client appears ok. However I have noticed KVM allows spare capacity to be used (if available) on the host, and the kvm host is suffering i.e. 200% for doing nothing.

As a comparison i have uploaded a snap shot of my production systems running suricata 5. As you see the utilisation on the suricata server is higher and there are more instances, but the kvm server is not showing much (in the screen shot). Though to be fare the CPU jumps all over the place between about 2% & 145% (typically under 90%) changing all the time as the load changes on the suricata server.

#10

Updated by Andrew Welham 8 months ago

I tried this command to start Suricata ( I think this is IDS mode)
/usr/bin/suricata -c /etc/suricata/suricata.yaml -i enp1s0

I've also tried to get some more detailed logs , but enabling debugging via the yaml config file or env makes no difference . I may be doing something wrong though

#11

Updated by Peter Manev 8 months ago

If you try

/usr/bin/suricata -c /etc/suricata/suricata.yaml -i enp1s0

Does it behave the same ? (sorry I did not understand)

#12

Updated by Andrew Welham 8 months ago

Yes It behaved the same..
I also tried downloading the source and compiling and the same behaviour happened

#13

Updated by Jungho Yoon 8 months ago

Some settings are different, but it seems to be a similar situation to my case.

https://forum.suricata.io/t/cpu-usage-of-version-6-0-0/706

#14

Updated by Peter Manev 8 months ago

@Jungho Yoon - in your case it seems you have traffic , here it is without traffic - am i correct ?

#15

Updated by Peter Manev 8 months ago

@Andrew Welham - how does the

suricata --build-info

Look like?

#16

Updated by Andrew Welham 8 months ago

Apologies I may have caused a little confusion here. The bug was initially detected on live product systems (in a keepalived cluster) and data flowing through them as in my initial post.
However as this bug was reproducible on a basic system (fresh install on a test system) with no traffic I used these details as the amount of data is does not look to be the cause.

so potentially the same issue as Jungho


Peter the requested build info is ( From a standard PPA)
/usr/bin/suricata  --build-info
This is Suricata version 6.0.0 RELEASE
Features: NFQ PCAP_SET_BUFF AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG LIBNET1.1 HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HAVE_LUA HAVE_LUAJIT HAVE_LIBJANSSON TLS TLS_C11 MAGIC RUST
SIMD support: none
Atomic intrinsics: 1 2 4 8 byte(s)
64-bits, Little-endian architecture
GCC version 9.3.0, C version 201112
compiled with _FORTIFY_SOURCE=2
L1 cache line size (CLS)=64
thread local storage method: _Thread_local
compiled with LibHTP v0.5.35, linked against LibHTP v0.5.35

Suricata Configuration:
  AF_PACKET support:                       yes
  eBPF support:                            no
  XDP support:                             no
  PF_RING support:                         no
  NFQueue support:                         yes
  NFLOG support:                           no
  IPFW support:                            no
  Netmap support:                          no
  DAG enabled:                             no
  Napatech enabled:                        no
  WinDivert enabled:                       no

  Unix socket enabled:                     yes
  Detection enabled:                       yes

  Libmagic support:                        yes
  libnss support:                          yes
  libnspr support:                         yes
  libjansson support:                      yes
  hiredis support:                         yes
  hiredis async with libevent:             yes
  Prelude support:                         no
  PCRE jit:                                yes
  LUA support:                             yes, through luajit
  libluajit:                               yes
  GeoIP2 support:                          yes
  Non-bundled htp:                         yes
  Old barnyard2 support:
  Hyperscan support:                       yes
  Libnet support:                          yes
  liblz4 support:                          yes

  Rust support:                            yes
  Rust strict mode:                        no
  Rust compiler path:                      /usr/bin/rustc
  Rust compiler version:                   rustc 1.43.0
  Cargo path:                              /usr/bin/cargo
  Cargo version:                           cargo 1.43.0
  Cargo vendor:                            yes

  Python support:                          yes
  Python path:                             /usr/bin/python3
  Python distutils                         yes
  Python yaml                              yes
  Install suricatactl:                     yes
  Install suricatasc:                      yes
  Install suricata-update:                 yes

  Profiling enabled:                       no
  Profiling locks enabled:                 no

  Plugin support (experimental):           yes

Development settings:
  Coccinelle / spatch:                     no
  Unit tests enabled:                      no
  Debug output enabled:                    no
  Debug validation enabled:                no

Generic build parameters:
  Installation prefix:                     /usr
  Configuration directory:                 /etc/suricata/
  Log directory:                           /var/log/suricata/

  --prefix                                 /usr
  --sysconfdir                             /etc
  --localstatedir                          /var
  --datarootdir                            /usr/share

  Host:                                    x86_64-pc-linux-gnu
  Compiler:                                gcc (exec name) / g++ (real)
  GCC Protect enabled:                     yes
  GCC march native enabled:                no
  GCC Profile enabled:                     no
  Position Independent Executable enabled: yes
  CFLAGS                                   -g -O2 -fdebug-prefix-map=/build/suricata-ngEYgo/suricata-6.0.0=. -fstack-protector-strong -Wformat -Werror=format-security -std=c11 -I${srcdir}/../rust/gen -I${srcdir}/../rust/dist
  PCAP_CFLAGS                               -I/usr/include
  SECCFLAGS                                -fstack-protector -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security

#17

Updated by Jungho Yoon 8 months ago

Different settings mean that the OS or suricata.yaml are not exactly the same. My case written on the forum was only inline (IPS Mode). In the inline (IPS Mode) configuration, both with no traffic and with traffic showed the same occupancy as CPU polling in Host (KVM) (using NIC/suricata CPU-affinity). Also, CPU usage did not increase as much as KVM, but CPU usage seemed to increase in Hyper-V and ESXi after upgrading to 6.0.0.

#18

Updated by Andrew Welham 8 months ago

I don’t know if this helps, Given Junghoran the perf command I tried the following
On a freshly built VM hosted in KVM, running ubuntu 20.04 fully patched
I installed suricata 5 from the PPA (archive), and did not configure or add rules.
I ran the following
/usr/bin/suricata -c /etc/suricata/suricata.yaml -q 0 -D
NO TRAFFIC WOULD PASS
Let it settle for 30 seconds (if required) and ran
perf stat -e 'sched:*' -p PID sleep 10
The results were
Performance counter stats for process id '46427':

0      sched:sched_kthread_stop
0 sched:sched_kthread_stop_ret
735 sched:sched_waking
723 sched:sched_wakeup
0 sched:sched_wakeup_new
1,096 sched:sched_switch
0 sched:sched_migrate_task
0 sched:sched_process_free
0 sched:sched_process_exit
0 sched:sched_wait_task
0 sched:sched_process_wait
0 sched:sched_process_fork
0 sched:sched_process_exec
0 sched:sched_stat_wait
0 sched:sched_stat_sleep
0 sched:sched_stat_iowait
0 sched:sched_stat_blocked
46,246,224 sched:sched_stat_runtime
0 sched:sched_pi_setprio
0 sched:sched_process_hang
0 sched:sched_move_numa
0 sched:sched_stick_numa
0 sched:sched_swap_numa
0 sched:sched_wake_idle_without_ipi
10.002187522 seconds time elapsed

I then uninstalled suricata 5
I installed suricata 6 from PPA, and did not configure or add rules.
I ran the following
/usr/bin/suricata -c /etc/suricata/suricata.yaml -q 0 -D
NO TRAFFIC WOULD PASS
Let it settle for 30 seconds (if required) and ran
perf stat -e 'sched:*' -p PID sleep 10
Performance counter stats for process id '46739':

0      sched:sched_kthread_stop
0 sched:sched_kthread_stop_ret
19 sched:sched_waking
11 sched:sched_wakeup
0 sched:sched_wakeup_new
128,412 sched:sched_switch
0 sched:sched_migrate_task
0 sched:sched_process_free
0 sched:sched_process_exit
0 sched:sched_wait_task
0 sched:sched_process_wait
0 sched:sched_process_fork
0 sched:sched_process_exec
0 sched:sched_stat_wait
0 sched:sched_stat_sleep
0 sched:sched_stat_iowait
0 sched:sched_stat_blocked
948,229,557 sched:sched_stat_runtime
0 sched:sched_pi_setprio
0 sched:sched_process_hang
0 sched:sched_move_numa
0 sched:sched_stick_numa
0 sched:sched_swap_numa
0 sched:sched_wake_idle_without_ipi
10.001963231 seconds time elapsed
There are some considerable differences, hope this helps.
#19

Updated by Andrew Welham 6 months ago

Any sign of this issue being investigated ?

#20

Updated by Peter Manev 6 months ago

Can you please share a bit more info how to reproduce this on the KVM host? Any details/specifics would help as it seems the issue is explicit(most severe) with KVM and 6.0 combination.

#21

Updated by Andrew Welham 6 months ago

Peter,
I've done a few more tests on KVM and also Virtual box (on a windows host).
The virtual box did not seem to suffer so much as is demonstrated by the results below, but there was still a difference,
when tested with perf stat -e 'sched:*' -p PID sleep 10. I have also included the config files for both VMs. They are standard VMs with more memory and CPU and a bridged NIC (single in this test case) 10 in the case of my production machine. The results are however very similar so it does not look to be down to the number of NICs.
Let me know if you want any further information

kvm suricata 6
Performance counter stats for process id '20841':

0      sched:sched_kthread_stop
0 sched:sched_kthread_stop_ret
21 sched:sched_waking
9 sched:sched_wakeup
0 sched:sched_wakeup_new
129,069 sched:sched_switch
0 sched:sched_migrate_task
0 sched:sched_process_free
0 sched:sched_process_exit
0 sched:sched_wait_task
0 sched:sched_process_wait
0 sched:sched_process_fork
0 sched:sched_process_exec
0 sched:sched_stat_wait
0 sched:sched_stat_sleep
0 sched:sched_stat_iowait
0 sched:sched_stat_blocked
849,350,259 sched:sched_stat_runtime
0 sched:sched_pi_setprio
0 sched:sched_process_hang
0 sched:sched_move_numa
0 sched:sched_stick_numa
0 sched:sched_swap_numa
1 sched:sched_wake_idle_without_ipi
10.005035852 seconds time elapsed

kvm suricata 5

  1. perf stat -e 'sched:*' -p 1447 sleep 10

    Performance counter stats for process id '1447':

    0      sched:sched_kthread_stop
    0 sched:sched_kthread_stop_ret
    28 sched:sched_waking
    27 sched:sched_wakeup
    0 sched:sched_wakeup_new
    13,474 sched:sched_switch
    7 sched:sched_migrate_task
    0 sched:sched_process_free
    0 sched:sched_process_exit
    0 sched:sched_wait_task
    0 sched:sched_process_wait
    0 sched:sched_process_fork
    0 sched:sched_process_exec
    0 sched:sched_stat_wait
    0 sched:sched_stat_sleep
    0 sched:sched_stat_iowait
    0 sched:sched_stat_blocked
    258,451,732 sched:sched_stat_runtime
    0 sched:sched_pi_setprio
    0 sched:sched_process_hang
    0 sched:sched_move_numa
    0 sched:sched_stick_numa
    0 sched:sched_swap_numa
    0 sched:sched_wake_idle_without_ipi
    10.003212991 seconds time elapsed

virtualbox suricata 6

Performance counter stats for process id '1447':
0      sched:sched_kthread_stop
0 sched:sched_kthread_stop_ret
34 sched:sched_waking
38 sched:sched_wakeup
0 sched:sched_wakeup_new
8,878 sched:sched_switch
20 sched:sched_migrate_task
0 sched:sched_process_free
0 sched:sched_process_exit
0 sched:sched_wait_task
0 sched:sched_process_wait
0 sched:sched_process_fork
0 sched:sched_process_exec
0 sched:sched_stat_wait
0 sched:sched_stat_sleep
0 sched:sched_stat_iowait
0 sched:sched_stat_blocked
202,047,094 sched:sched_stat_runtime
0 sched:sched_pi_setprio
0 sched:sched_process_hang
0 sched:sched_move_numa
0 sched:sched_stick_numa
0 sched:sched_swap_numa
0 sched:sched_wake_idle_without_ipi
10.006997425 seconds time elapsed

Virtual box Suricata 5
perf stat -e 'sched:*' -p 22517 sleep 10

Performance counter stats for process id '22517':
0      sched:sched_kthread_stop
0 sched:sched_kthread_stop_ret
9,654 sched:sched_waking
9,644 sched:sched_wakeup
0 sched:sched_wakeup_new
1,097 sched:sched_switch
0 sched:sched_migrate_task
0 sched:sched_process_free
0 sched:sched_process_exit
0 sched:sched_wait_task
0 sched:sched_process_wait
0 sched:sched_process_fork
0 sched:sched_process_exec
0 sched:sched_stat_wait
0 sched:sched_stat_sleep
0 sched:sched_stat_iowait
0 sched:sched_stat_blocked
150,835,414 sched:sched_stat_runtime
0 sched:sched_pi_setprio
0 sched:sched_process_hang
0 sched:sched_move_numa
0 sched:sched_stick_numa
0 sched:sched_swap_numa
1 sched:sched_wake_idle_without_ipi
10.002084152 seconds time elapsed
#22

Updated by Andrew Welham 6 months ago

i was reading https://forum.suricata.io/t/cpu-usage-of-version-6-0-0/706/20
and also other articles in the internet about usleep() and some instances of high CPU.
As a test i build from source suricata and changed the values in the usleep() calls inside flow-manager.c and the CPU dropped dramatically.
I tried various figures ranging from doubling to adding an extra 0 (nothing scientific).
I realise this only slows up suricata , but the CPU did drop in some cases to 25% from 200% so nothing perfect but an indicator of the direction.

I have not tested but based on the article Victor Julien may be on to something.

#23

Updated by Victor Julien 6 months ago

  • Status changed from New to Assigned
  • Assignee set to Victor Julien
  • Target version set to 7.0rc1
  • Label Needs backport to 6.0 added

I can reproduce this. It looks like changing the usleep value to 200 already gives a big improvement, but setting it much higher to something like 5000 gives better results. Need to look at what the impact is of changing this.

#24

Updated by Victor Julien 6 months ago

  • Subject changed from Ubuntu 20.04 PPA upgraded to Suricata 6 now @ 200% CPU with no activity to flow manager: 200% CPU in KVM host with no activity with Suricata 6
#25

Updated by Victor Julien 6 months ago

It is not related to the PPA. Happens with compiles from source as well.

#26

Updated by Michael Tremer 6 months ago

Hello boys,

I would like to let you know that we have been experiencing the same issue with IPFire installations, too.

We have upgraded from Suricata 5 to 6 and did not change a line in the configuration file. On regular hardware, load has increases quite a bit, and it is through the roof on virtual machines.

Our configuration file is here (https://git.ipfire.org/?p=ipfire-2.x.git;a=blob;f=config/suricata/suricata.yaml;h=4e9e399675551c8a5bfd81568da622b2f3767576;hb=HEAD) and it should be fairly slow to the stock configuration.

We have not looked too deep into this yet, but we are tracking it all here (https://bugzilla.ipfire.org/show_bug.cgi?id=12548).

#27

Updated by Jeff Lucovsky 5 months ago

  • Copied to Bug #4314: flow manager: 200% CPU in KVM host with no activity with Suricata 6 added
#29

Updated by Michael Tremer 4 months ago

Thank you for the fix. We have compiled this and tested it, but I am afraid to have to report that it does not fix this, but it makes is very slightly better.

This graphs shows CPU usage on an idle machine that is running suricata 5: https://bugzilla.ipfire.org/attachment.cgi?id=858

This shows CPU usage on the same machine with suricata 6.0.1 and the patch: https://bugzilla.ipfire.org/attachment.cgi?id=859

Therefore, I would like to request to reopen this ticket and search for a solution that brings us back down to levels of suricata 5. Apart from slowing down other applications because of the CPU being busy running in a loop, this is going to waste a lot of energy across our whole installation base.

Of course we are here, happy to help with verifying any proposed solutions. Thank you for looking into this.

#30

Updated by Victor Julien 4 months ago

Thanks for the follow up Michael Tremer I've created #4379 to track this.

#31

Updated by Michael Tremer 4 months ago

Thank you!

Also available in: Atom PDF