Project

General

Profile

Actions

Bug #5619

closed

dpdk/ips: crash at shutdown with mlx

Added by Victor Julien over 1 year ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Description

I consistently see this crash when using this NIC

0000:85:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens3f0np0 drv=mlx5_core unused=vfio-pci 
0000:85:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens3f1np1 drv=mlx5_core unused=vfio-pci 

It needs to see some traffic. A start and stop when no traffic is received works fine.

victor@z840:~/dev/suricata$ sudo ./src/suricata -c suricata-ips-mlx-25G-profile-nsm-full.yaml -l /home/victor/vm/logs/ --disable-detection --set flow.memcap=32gb --set stream.memcap=16gb --set stream.reassembly.memcap=64gb --dpdk -v
27/10/2022 -- 13:49:17 - <Notice> - This is Suricata version 7.0.0-beta1 RELEASE running in SYSTEM mode
27/10/2022 -- 13:49:17 - <Info> - CPUs/cores online: 56
27/10/2022 -- 13:49:17 - <Info> - Shortening device name to: 0000..00.0
27/10/2022 -- 13:49:17 - <Info> - Shortening device name to: 0000..00.1
27/10/2022 -- 13:49:17 - <Info> - eve-log output device (regular) initialized: /dev/null
27/10/2022 -- 13:49:17 - <Info> - stats output device (regular) initialized: stats.log
EAL: No available hugepages reported in hugepages-1048576kB
27/10/2022 -- 13:49:18 - <Info> - DPDK IPS mode activated between 0000:85:00.0 and 0000:85:00.1
27/10/2022 -- 13:49:18 - <Info> - Creating a packet mbuf pool mempool_0000:85:00.0 of size 262143, cache size 511, mbuf size 2176
27/10/2022 -- 13:49:18 - <Info> - DPDK IPS mode activated between 0000:85:00.1 and 0000:85:00.0
27/10/2022 -- 13:49:18 - <Info> - Going to use 27 thread(s) for device 0000:85:00.0
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:19 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:19 - <Info> - DPDK IPS mode activated between 0000:85:00.1 and 0000:85:00.0
27/10/2022 -- 13:49:19 - <Info> - Creating a packet mbuf pool mempool_0000:85:00.1 of size 262143, cache size 511, mbuf size 2176
27/10/2022 -- 13:49:20 - <Info> - DPDK IPS mode activated between 0000:85:00.0 and 0000:85:00.1
27/10/2022 -- 13:49:20 - <Info> - Going to use 27 thread(s) for device 0000:85:00.1
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected
27/10/2022 -- 13:49:21 - <Info> - Running in live mode, activating unix socket
27/10/2022 -- 13:49:21 - <Info> - Using unix socket file '/var/run/suricata/suricata-command.socket'
27/10/2022 -- 13:49:21 - <Notice> - Threads created -> W: 54 FM: 1 FR: 1   Engine started.
^C27/10/2022 -- 13:49:36 - <Notice> - Signal Received.  Stopping engine.
27/10/2022 -- 13:49:36 - <Info> - time elapsed 18.715s
27/10/2022 -- 13:49:40 - <Info> - Alerts: 0
27/10/2022 -- 13:49:40 - <Notice> - Stats for '0000:85:00.0':  pkts: 7077119, drop: 5858965 (82.79%), invalid chksum: 0
27/10/2022 -- 13:49:40 - <Info> - Closing device 0000:85:00.0
Segmentation fault
Actions #1

Updated by Victor Julien over 1 year ago

bt

27/10/2022 -- 15:11:56 - <Info> - Alerts: 0
[Thread 0x7ffedcff9700 (LWP 4171194) exited]
27/10/2022 -- 15:11:56 - <Notice> - Stats for '0000:85:00.0':  pkts: 10427224, drop: 3915191 (37.55%), invalid chksum: 0
27/10/2022 -- 15:11:56 - <Info> - Closing device 0000:85:00.0

Thread 1 "Suricata-Main" received signal SIGSEGV, Segmentation fault.
0x00007ffff3560359 in ?? () from /usr/lib/x86_64-linux-gnu/dpdk/pmds-20.0/librte_pmd_mlx5.so
(gdb) bt
#0  0x00007ffff3560359 in ?? () from /usr/lib/x86_64-linux-gnu/dpdk/pmds-20.0/librte_pmd_mlx5.so
#1  0x00007ffff3690858 in ?? () from /usr/lib/x86_64-linux-gnu/dpdk/pmds-20.0/librte_pmd_mlx5.so
#2  0x00007ffff3556680 in ?? () from /usr/lib/x86_64-linux-gnu/dpdk/pmds-20.0/librte_pmd_mlx5.so
#3  0x00007ffff6dd5d8d in rte_eth_dev_close () from /lib/x86_64-linux-gnu/librte_ethdev.so.20.0
#4  0x0000000000552a2e in DPDKCloseDevice (ldev=ldev@entry=0xdb8340) at util-dpdk.c:54
#5  0x00000000005522c1 in LiveDeviceListClean () at util-device.c:364
#6  0x0000000000543f7c in GlobalsDestroy (suri=<optimized out>) at suricata.c:364
#7  0x00000000005439a2 in SuricataMain (argc=<optimized out>, argv=<optimized out>) at suricata.c:2960
#8  0x00007ffff6aad083 in __libc_start_main (main=0x5400a0 <main>, argc=14, argv=0x7fffffffe3a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe398) at ../csu/libc-start.c:308
#9  0x000000000053ffde in _start ()

Actions #2

Updated by Lukas Sismis over 1 year ago

I've got the same card

0000:b3:00.0 'MT27700 Family [ConnectX-4] 1013' if=ens1f0 drv=mlx5_core unused= 
0000:b3:00.1 'MT27700 Family [ConnectX-4] 1013' if=ens1f1 drv=mlx5_core unused= 

I tried to hook it up in the IPS mode but I couldn't reproduce any segfault so far. I tried it both with single packet/traffic capture, with and without rules. I had 4 workers enabled.

The only difference I see is that you had a multi-NUMA node machine and the machine I had available for the test has only a single NUMA node.

24/11/2022 -- 18:25:28 - <Info> - 21315 signatures processed. 1417 are IP-only rules, 3811 are inspecting packet payload, 15857 inspect application layer, 104 are decoder event only                                                                 

EAL: No available 1048576 kB hugepages reported
mlx5_net: No available register for sampler.
mlx5_net: No available register for sampler.
TELEMETRY: No legacy callbacks, legacy socket not created
24/11/2022 -- 18:25:42 - <Info> - DPDK IPS mode activated between 0000:b3:00.0 and 0000:b3:00.1
24/11/2022 -- 18:25:42 - <Info> - Creating a packet mbuf pool mempool_0000:b3:00.0 of size 65535, cache size 257, mbuf size 2176                                                                                                                      
24/11/2022 -- 18:25:42 - <Info> - DPDK IPS mode activated between 0000:b3:00.1 and 0000:b3:00.0
24/11/2022 -- 18:25:42 - <Info> - Going to use 4 thread(s) for device 0000:b3:00.0
24/11/2022 -- 18:25:42 - <Info> - DPDK IPS mode activated between 0000:b3:00.1 and 0000:b3:00.0
24/11/2022 -- 18:25:42 - <Info> - Creating a packet mbuf pool mempool_0000:b3:00.1 of size 65535, cache size 257, mbuf size 2176                                                                                                                      
24/11/2022 -- 18:25:42 - <Info> - DPDK IPS mode activated between 0000:b3:00.0 and 0000:b3:00.1
24/11/2022 -- 18:25:42 - <Info> - Going to use 4 thread(s) for device 0000:b3:00.1
24/11/2022 -- 18:25:43 - <Info> - Running in live mode, activating unix socket
24/11/2022 -- 18:25:43 - <Info> - Using unix socket file '/home/local/suricata/var/run/suricata/suricata-command.socket'                                                                                                                     
24/11/2022 -- 18:25:43 - <Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - Cannot create socket directory /home/local/suricata/var/run/suricata/: No such file or directory                                                                   
24/11/2022 -- 18:25:43 - <Warning> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - Unable to create unix command socket
24/11/2022 -- 18:25:43 - <Notice> - Threads created -> W: 8 FM: 1 FR: 1   Engine started.
^C24/11/2022 -- 18:25:50 - <Notice> - Signal Received.  Stopping engine.
24/11/2022 -- 18:25:51 - <Info> - time elapsed 10.830s
24/11/2022 -- 18:25:53 - <Info> - Alerts: 33120
24/11/2022 -- 18:25:53 - <Info> - cleaning up signature grouping structure... complete
24/11/2022 -- 18:25:53 - <Notice> - Stats for '0000:b3:00.0':  pkts: 4548778, drop: 996196 (21.90%), invalid chksum: 0
24/11/2022 -- 18:25:53 - <Info> - Closing device 0000:b3:00.0
24/11/2022 -- 18:25:53 - <Notice> - Stats for '0000:b3:00.1':  pkts: 3885285, drop: 32670 (0.84%), invalid chksum: 0
24/11/2022 -- 18:25:53 - <Info> - Closing device 0000:b3:00.1
Actions #3

Updated by Victor Julien about 1 year ago

  • Target version changed from 7.0.0-rc1 to 7.0.0-rc2
Actions #4

Updated by Victor Julien 11 months ago

  • Target version changed from 7.0.0-rc2 to 7.0.0
Actions #5

Updated by Lukas Sismis 10 months ago

@Victor Julien has this not been fixed by https://github.com/OISF/suricata/pull/8859 ?

Can you please verify that as the crash doesn't occur in my environment?

Actions #6

Updated by Victor Julien 9 months ago

  • Target version changed from 7.0.0 to 7.0.1
Actions #7

Updated by Victor Julien 8 months ago

  • Status changed from Assigned to In Progress
  • Assignee changed from Lukas Sismis to Victor Julien
  • Priority changed from High to Normal
Actions #8

Updated by Victor Julien 8 months ago

  • Status changed from In Progress to Closed
Actions

Also available in: Atom PDF