Task #3318: Research: NUMA awareness - Suricata - Open Information Security Foundation

Actions

Copy link

Task #3318

open

Research: NUMA awareness

Added by Victor Julien over 5 years ago. Updated about 2 years ago.

Status:

New

Priority:

Normal

Assignee:

OISF Dev

Target version:

TBD

Effort:

Difficulty:

Label:

Description

In several talks at suricon we've seen that the best performance happens when the NIC and suricata are on the same NUMA node, and that Suricata should be limited to this node.

Even in a multi-NIC scenario, Suricata will likely not perform well when running on multiple nodes at once, as global data structures like the flow table are then accessed/updated over the interconnects a lot.

Evaluate what strategies exist.

Reading material:
https://www.akkadia.org/drepper/cpumemory.pdf
https://stackoverflow.com/a/47714514/2756873

Related issues 4 (1 open — 3 closed)

Actions

Copy link

Updated by Victor Julien over 5 years ago

Related to Task #3288: Suricon 2019 brainstorm added

Actions

Copy link

Updated by Victor Julien over 5 years ago

Tracker changed from Feature to Task
Subject changed from numa awareness to Research: NUMA awareness

Actions

Copy link

Updated by Victor Julien over 5 years ago

Several possible subtasks come to mind:

making configuration easier: take NUMA into account when configuring CPU affinity. Currently a list of CPUs has to be provided, which can be tedious and error prone. libnuma could help with identifying the CPUs belong to a node.
assign memory to specific nodes: the default allocation behaviour (at least on Linux) seems to already be that the allocating thread allocates memory in its own node. For packets we already do this correctly, with packet pools initialized per thread, in the thread. But for example the flow spare queue is global and the flows in it are initially alloc'd from the main thread, and later updated from the flow manager. This means these flows will likely be unbalanced and lean towards one node more than others. Creating per thread flow spare queues could be one way to address this. Similarly for other 'pools' like stream segments, sessions, etc.
duplicate data structures per node. Not sure yet if this a good strategy, but the idea is that something like the flow table or detect engine would have a copy per node to guarantee locality. In a properly functioning flow table this should be clean, as the flows should stay on the same thread (=CPU). For the detection engine this will pretty much duplicate memory use for the detection engine. Unless loading is done in parallel, start up time would also increase.

Actions

Copy link

Updated by Andreas Herz over 5 years ago

Assignee set to OISF Dev
Target version set to TBD

Actions

Copy link

Updated by Victor Julien over 5 years ago

Description updated (diff)
Status changed from New to Assigned
Assignee changed from OISF Dev to Victor Julien

Actions

Copy link

Updated by Andreas Herz over 5 years ago

Do we also have some more insights how this does affect the management threads for example? If we can at least move those to a different node to keep the other cpu cores free for the heavy tasks?

Actions

Copy link

Updated by Victor Julien over 5 years ago

They would probably have to run on the same node as where the traffic is and where the memory for that traffic is owned to avoid accessing locks over the interconnects.

Actions

Copy link

Updated by Victor Julien over 4 years ago

Related to Task #3695: research: libhwloc for better autoconfiguration added

Actions

Copy link

Updated by Victor Julien about 2 years ago

Status changed from Assigned to New
Assignee changed from Victor Julien to OISF Dev

@Lukas Sismis since you've been doing a bit of NUMA stuff for DPDK, I wonder if you have some thoughts on the topic

Actions

Copy link

#10

Updated by Victor Julien about 1 year ago

Related to Feature #6805: cpu-affinity: enhance CPU affinity logic with per-interface NUMA preferences added

Actions

Copy link

#11

Updated by Victor Julien about 1 year ago

Related to Feature #7036: DPDK NUMA setup: choose correct CPUs from worker-cpu-set added

Actions

Copy link

Also available in: Atom PDF

Project

General

Custom queries

Profile

Suricata

Task #3318

Research: NUMA awareness

Updated by Victor Julien over 5 years ago

Updated by Victor Julien over 5 years ago

Updated by Victor Julien over 5 years ago

Updated by Andreas Herz over 5 years ago

Updated by Victor Julien over 5 years ago

Updated by Andreas Herz over 5 years ago

Updated by Victor Julien over 5 years ago

Updated by Victor Julien over 4 years ago

Updated by Victor Julien about 2 years ago

Updated by Victor Julien about 1 year ago

Updated by Victor Julien about 1 year ago

Related to Suricata - Task #3288: Suricon 2019 brainstorm	Assigned	Victor Julien	Actions
Related to Suricata - Task #3695: research: libhwloc for better autoconfiguration	Closed	Lukas Sismis	Actions
Related to Suricata - Feature #6805: cpu-affinity: enhance CPU affinity logic with per-interface NUMA preferences	Closed	Lukas Sismis	Actions
Related to Suricata - Feature #7036: DPDK NUMA setup: choose correct CPUs from worker-cpu-set	Closed	Lukas Sismis	Actions