Actions
Bug #5619
closeddpdk/ips: crash at shutdown with mlx
Affected Versions:
Effort:
Difficulty:
Label:
Description
I consistently see this crash when using this NIC
0000:85:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens3f0np0 drv=mlx5_core unused=vfio-pci 0000:85:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens3f1np1 drv=mlx5_core unused=vfio-pci
It needs to see some traffic. A start and stop when no traffic is received works fine.
victor@z840:~/dev/suricata$ sudo ./src/suricata -c suricata-ips-mlx-25G-profile-nsm-full.yaml -l /home/victor/vm/logs/ --disable-detection --set flow.memcap=32gb --set stream.memcap=16gb --set stream.reassembly.memcap=64gb --dpdk -v 27/10/2022 -- 13:49:17 - <Notice> - This is Suricata version 7.0.0-beta1 RELEASE running in SYSTEM mode 27/10/2022 -- 13:49:17 - <Info> - CPUs/cores online: 56 27/10/2022 -- 13:49:17 - <Info> - Shortening device name to: 0000..00.0 27/10/2022 -- 13:49:17 - <Info> - Shortening device name to: 0000..00.1 27/10/2022 -- 13:49:17 - <Info> - eve-log output device (regular) initialized: /dev/null 27/10/2022 -- 13:49:17 - <Info> - stats output device (regular) initialized: stats.log EAL: No available hugepages reported in hugepages-1048576kB 27/10/2022 -- 13:49:18 - <Info> - DPDK IPS mode activated between 0000:85:00.0 and 0000:85:00.1 27/10/2022 -- 13:49:18 - <Info> - Creating a packet mbuf pool mempool_0000:85:00.0 of size 262143, cache size 511, mbuf size 2176 27/10/2022 -- 13:49:18 - <Info> - DPDK IPS mode activated between 0000:85:00.1 and 0000:85:00.0 27/10/2022 -- 13:49:18 - <Info> - Going to use 27 thread(s) for device 0000:85:00.0 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:18 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:19 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:19 - <Info> - DPDK IPS mode activated between 0000:85:00.1 and 0000:85:00.0 27/10/2022 -- 13:49:19 - <Info> - Creating a packet mbuf pool mempool_0000:85:00.1 of size 262143, cache size 511, mbuf size 2176 27/10/2022 -- 13:49:20 - <Info> - DPDK IPS mode activated between 0000:85:00.0 and 0000:85:00.1 27/10/2022 -- 13:49:20 - <Info> - Going to use 27 thread(s) for device 0000:85:00.1 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:20 - <Warning> - [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 1 but thread on NUMA 0. Decreased performance expected 27/10/2022 -- 13:49:21 - <Info> - Running in live mode, activating unix socket 27/10/2022 -- 13:49:21 - <Info> - Using unix socket file '/var/run/suricata/suricata-command.socket' 27/10/2022 -- 13:49:21 - <Notice> - Threads created -> W: 54 FM: 1 FR: 1 Engine started. ^C27/10/2022 -- 13:49:36 - <Notice> - Signal Received. Stopping engine. 27/10/2022 -- 13:49:36 - <Info> - time elapsed 18.715s 27/10/2022 -- 13:49:40 - <Info> - Alerts: 0 27/10/2022 -- 13:49:40 - <Notice> - Stats for '0000:85:00.0': pkts: 7077119, drop: 5858965 (82.79%), invalid chksum: 0 27/10/2022 -- 13:49:40 - <Info> - Closing device 0000:85:00.0 Segmentation fault
Updated by Victor Julien about 2 years ago
bt
27/10/2022 -- 15:11:56 - <Info> - Alerts: 0 [Thread 0x7ffedcff9700 (LWP 4171194) exited] 27/10/2022 -- 15:11:56 - <Notice> - Stats for '0000:85:00.0': pkts: 10427224, drop: 3915191 (37.55%), invalid chksum: 0 27/10/2022 -- 15:11:56 - <Info> - Closing device 0000:85:00.0 Thread 1 "Suricata-Main" received signal SIGSEGV, Segmentation fault. 0x00007ffff3560359 in ?? () from /usr/lib/x86_64-linux-gnu/dpdk/pmds-20.0/librte_pmd_mlx5.so (gdb) bt #0 0x00007ffff3560359 in ?? () from /usr/lib/x86_64-linux-gnu/dpdk/pmds-20.0/librte_pmd_mlx5.so #1 0x00007ffff3690858 in ?? () from /usr/lib/x86_64-linux-gnu/dpdk/pmds-20.0/librte_pmd_mlx5.so #2 0x00007ffff3556680 in ?? () from /usr/lib/x86_64-linux-gnu/dpdk/pmds-20.0/librte_pmd_mlx5.so #3 0x00007ffff6dd5d8d in rte_eth_dev_close () from /lib/x86_64-linux-gnu/librte_ethdev.so.20.0 #4 0x0000000000552a2e in DPDKCloseDevice (ldev=ldev@entry=0xdb8340) at util-dpdk.c:54 #5 0x00000000005522c1 in LiveDeviceListClean () at util-device.c:364 #6 0x0000000000543f7c in GlobalsDestroy (suri=<optimized out>) at suricata.c:364 #7 0x00000000005439a2 in SuricataMain (argc=<optimized out>, argv=<optimized out>) at suricata.c:2960 #8 0x00007ffff6aad083 in __libc_start_main (main=0x5400a0 <main>, argc=14, argv=0x7fffffffe3a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe398) at ../csu/libc-start.c:308 #9 0x000000000053ffde in _start ()
Updated by Lukas Sismis about 2 years ago
I've got the same card
0000:b3:00.0 'MT27700 Family [ConnectX-4] 1013' if=ens1f0 drv=mlx5_core unused= 0000:b3:00.1 'MT27700 Family [ConnectX-4] 1013' if=ens1f1 drv=mlx5_core unused=
I tried to hook it up in the IPS mode but I couldn't reproduce any segfault so far. I tried it both with single packet/traffic capture, with and without rules. I had 4 workers enabled.
The only difference I see is that you had a multi-NUMA node machine and the machine I had available for the test has only a single NUMA node.
24/11/2022 -- 18:25:28 - <Info> - 21315 signatures processed. 1417 are IP-only rules, 3811 are inspecting packet payload, 15857 inspect application layer, 104 are decoder event only EAL: No available 1048576 kB hugepages reported mlx5_net: No available register for sampler. mlx5_net: No available register for sampler. TELEMETRY: No legacy callbacks, legacy socket not created 24/11/2022 -- 18:25:42 - <Info> - DPDK IPS mode activated between 0000:b3:00.0 and 0000:b3:00.1 24/11/2022 -- 18:25:42 - <Info> - Creating a packet mbuf pool mempool_0000:b3:00.0 of size 65535, cache size 257, mbuf size 2176 24/11/2022 -- 18:25:42 - <Info> - DPDK IPS mode activated between 0000:b3:00.1 and 0000:b3:00.0 24/11/2022 -- 18:25:42 - <Info> - Going to use 4 thread(s) for device 0000:b3:00.0 24/11/2022 -- 18:25:42 - <Info> - DPDK IPS mode activated between 0000:b3:00.1 and 0000:b3:00.0 24/11/2022 -- 18:25:42 - <Info> - Creating a packet mbuf pool mempool_0000:b3:00.1 of size 65535, cache size 257, mbuf size 2176 24/11/2022 -- 18:25:42 - <Info> - DPDK IPS mode activated between 0000:b3:00.0 and 0000:b3:00.1 24/11/2022 -- 18:25:42 - <Info> - Going to use 4 thread(s) for device 0000:b3:00.1 24/11/2022 -- 18:25:43 - <Info> - Running in live mode, activating unix socket 24/11/2022 -- 18:25:43 - <Info> - Using unix socket file '/home/local/suricata/var/run/suricata/suricata-command.socket' 24/11/2022 -- 18:25:43 - <Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - Cannot create socket directory /home/local/suricata/var/run/suricata/: No such file or directory 24/11/2022 -- 18:25:43 - <Warning> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - Unable to create unix command socket 24/11/2022 -- 18:25:43 - <Notice> - Threads created -> W: 8 FM: 1 FR: 1 Engine started. ^C24/11/2022 -- 18:25:50 - <Notice> - Signal Received. Stopping engine. 24/11/2022 -- 18:25:51 - <Info> - time elapsed 10.830s 24/11/2022 -- 18:25:53 - <Info> - Alerts: 33120 24/11/2022 -- 18:25:53 - <Info> - cleaning up signature grouping structure... complete 24/11/2022 -- 18:25:53 - <Notice> - Stats for '0000:b3:00.0': pkts: 4548778, drop: 996196 (21.90%), invalid chksum: 0 24/11/2022 -- 18:25:53 - <Info> - Closing device 0000:b3:00.0 24/11/2022 -- 18:25:53 - <Notice> - Stats for '0000:b3:00.1': pkts: 3885285, drop: 32670 (0.84%), invalid chksum: 0 24/11/2022 -- 18:25:53 - <Info> - Closing device 0000:b3:00.1
Updated by Victor Julien almost 2 years ago
- Target version changed from 7.0.0-rc1 to 7.0.0-rc2
Updated by Victor Julien over 1 year ago
- Target version changed from 7.0.0-rc2 to 7.0.0
Updated by Lukas Sismis over 1 year ago
@Victor Julien has this not been fixed by https://github.com/OISF/suricata/pull/8859 ?
Can you please verify that as the crash doesn't occur in my environment?
Updated by Victor Julien over 1 year ago
- Target version changed from 7.0.0 to 7.0.1
Updated by Victor Julien over 1 year ago
- Status changed from Assigned to In Progress
- Assignee changed from Lukas Sismis to Victor Julien
- Priority changed from High to Normal
Updated by Victor Julien over 1 year ago
- Status changed from In Progress to Closed
Actions