Feature #7949
Updated by Lukas Sismis 10 days ago
For deployments, it might be beneficial to define CPU affinity relatively or to refer to the actual system configuration. Suppose we have a machine like: <pre> $ lscpu … NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0-27,56-83 NUMA node1 CPU(s): 28-55,84-111 … </pre> h3. Isolated cores One example might be using isolated CPU sets for worker cores, as those are often used in Suricata bare-metal deployments, so: <pre> cpu-affinity: - management-cpu-set: cpu: [ 1,29,57,85 ] - worker-cpu-set: cpu: [ "isolated-cores" ] mode: "exclusive" prio: default: "high" </pre> This would *mostly* centralize CPU affinity definition to a configuration logic/process that sets up the sensor. Suricata could fetch the CPU isolated list from the supported system and apply it to the worker CPU set affinity (this might require a custom logic per OS). h3. Interface IRQ cores In high-performance deployments with capture methods like AF-PACKET, operators configure CPUs to which interrupts from the network interfaces are routed. CPU affinity can query this information from the system, e.g., using @ethtool -x "$IF"@ and determine the assigned CPUs. If @irq-all-cores@ is specified in the affinity, the query can be iterated over all interfaces in the (e.g., AF-PACKET) capture method list. Similarly, as Suricata 8 can now assign CPU cores to interfaces per their NUMA locality ("Docs":http://docs.suricata.io/en/latest/configuration/suricata-yaml.html#automatic-numa-aware-cpu-core-pinning ), Suricata could assign CPU cores per their IRQ settings. This could be expressed in the per-interface CPU configuration too, through e.g., @irq-cores@ (potentially it can be named universally, but I want to distinguish the terms for global/if-specific assignment at the moment). h4. Global level: <pre> af-packet: - interface: eno1 - interface: eno2 threading: autopin-irq: yes <--- new cpu-affinity: management-cpu-set: cpu: [ 1,29,57,85 ] worker-cpu-set: cpu: [ "irq-all-cores" ] mode: "exclusive" prio: default: "high" </pre> Suricata would first query to which CPUs are IRQs of eno1 and eno2 tied to and then it would populate the list for worker-cpu-set. During the startup, Suricata would affine the right CPUs to eno1 and eno2 individually. If eno1 would be configured to CPU cores 3, 4 and eno2 to 1, 2 then worker-cpu-list would consist of [ 1, 2, 3, 4 ]. Workers processing eno1 interface would be assigned with CPUs 3 and 4. h4. Per-interface level: <pre> af-packet: - interface: eno1 - interface: eno2 threading: autopin-irq: yes <--- would not be needed here cpu-affinity: management-cpu-set: cpu: [ 1,29,57,85 ] worker-cpu-set: cpu: [ "isolated" ] mode: "exclusive" prio: default: "high" interface-specific-cpu-set: - interface: "eno2" cpu: [ "irq-cores" ] mode: "exclusive" prio: high: [ "all" ] default: "medium" </pre> In this case, eno2 uses IRQ-bound CPU cores, and other interfaces use the "default" worker CPU cores. h3. Base cores In Suricata deployments, we often omit the first CPU cores from assignment and leave it to the kernel/OS. We could use this to define another term, particularly for excluding these cores. The term @base-cores@ in our example would primarily define cores 0 and 28. If SMT (Hyperthreading) is enabled, then also cores 56 and 84. h3. Support Set / Logic operators (Negation, Union, Subtraction) Previous terms are still CPU-dependent in the manager CPU set. To be more expressive, how to define the CPU sets, we could use different sets operators. By expressing "all - isolated-cores - base-cores" we could define e.g., management CPU set without being tied to the underlying HW platform. <pre> cpu-affinity: - management-cpu-set: cpu: [ "all - isolated-cores - base-cores" ] - worker-cpu-set: cpu: [ "isolated-cores" ] mode: "exclusive" prio: default: "high" </pre> This could additionally support list arithmetics, e.g., use all isolated except 2 CPU cores for worker-cpu-set, and the remaining 2 isolated CPU cores for management CPU cores. In case SMT (Hyperthreading) is enabled, the lists should be ordered with CPU pairs, e.g.: 0,56,1,57... or there should be other means of expressing this. <pre> cpu-affinity: - management-cpu-set: cpu: [ "isolated-cores[:1]" ] # <- everything from start of the list until the 2nd element (indexed from zero) - worker-cpu-set: cpu: [ "isolated-cores[2:]" ] # <- everything from the third element until the end mode: "exclusive" prio: default: "high" </pre> h3. Other What I also miss for optimal configuration is the ability to create possible memory structures on local NUMA nodes (e.g., flow table) and NUMA-local individual management threads (more granular management CPU configuration) -- if two interfaces operate on two NUMA nodes, there is no way to specify that a particular flow manager will work on a particular flow table only.