Project

General

Profile

Actions

Optimization #3830

open

pcap source: PcapThreadVars and cache lines

Added by Roland Fischer over 3 years ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Target version:
Effort:
Difficulty:
high
Label:

Description

Mentioned in fix for #2845 in PR https://github.com/OISF/suricata/pull/5137.

We should look into optimizing cache lines for the PcapThreadVars struct.

Requires some cachegrind'ing to assess changes.

Actions #1

Updated by Roland Fischer over 3 years ago

I simply added PcapStats64 last_stats64; at the end to fix #2845 and not change the cache line behaviour for the hot path.

It's in a different cache line as the other members used at the same time. In an ideal world, one has all the members used in the same hot code paths be in the same cache line. It can be argued whether the last_stats64 is in the hot path. Not really as it only is accessed every second.

The current PcapThreadVars straddles three cache lines (64 bytes for most CPUs).

Actions #2

Updated by Roland Fischer over 3 years ago

Looking at PcapThreadVars, one can see that the members can be roughly split into multiple usages:

  • Used in the hot path in PcapCallbackLoop() which gets called for every packet
  • Used moderately in PcapDumpCounters() which gets called once per second
  • Used rarely in PcapTryReopen() which gets called in pcap_dispatch() error path. These are also used in ReceivePcapThreadInit()
  • Used one time in ReceivePcapThreadInit() which only gets called on thread start
Actions #3

Updated by Roland Fischer over 3 years ago

A first optimization, without changing current cache line behaviour too much is...

By putting last_stats64 at the end it currently straddles two cache lines which we should avoid ideally - sizeof(PcapThreadVars) is 120 before this change. Having said that, it only is used once a second.

I would recommend to move some config data only used in ReceivePcapThreadInit() to the end to avoid that as a minimum. This will make last_stats64 completely fit into the 2nd cache line which will be better. I.e.

    ...
    ChecksumValidationMode checksum_mode;

    LiveDevice *livedev;

    PcapStats64 last_stats64;

    /* 3RD CACHE LINE - data only used once during ReceivePcapThreadInit() */
    /* ptr to string from config */
    const char *bpf_filter;

    /* pcap buffer size */
    int pcap_buffer_size;
    int pcap_snaplen;
} PcapThreadVars;
Actions #4

Updated by Roland Fischer over 3 years ago

A more thorough optimization could be:

Try to stuff all the fields used in PcapCallbackLoop() into the same cache line? Assumption typical cache line of 64 bytes for Intel/AMD. Having said that, this might not even make a huge difference, but should be more optimal. famous last words.

typedef struct PcapThreadVars_
{
    /* 1ST CACHE LINE - data used in PcapCallbackLoop() */
    /* thread specific handle */
    pcap_t *pcap_handle;

    time_t last_stats_dump;

    LiveDevice *livedev;
    ThreadVars *tv;
    TmSlot *slot;

    /* counters */
    uint64_t bytes;
    uint64_t pkts;

    /* data link type for the thread */
    int datalink;

    ChecksumValidationMode checksum_mode;

    /* 2ND CACHE LINE */
    PcapStats64 last_stats64;
    uint16_t capture_kernel_packets;
    uint16_t capture_kernel_drops;
    uint16_t capture_kernel_ifdrops;

    /* ptr to string from config */
    const char *bpf_filter;

    /* thread specific bpf - pcap_setfilter() makes a copy into pcap_t */
    struct bpf_program filter;

    /** callback result -- set if one of the thread module failed. */
    int cb_result;

    /* handle state */
    unsigned char pcap_state;

    /* 3RD CACHE LINE */
    /* pcap buffer size */
    int pcap_buffer_size;
    int pcap_snaplen;
} PcapThreadVars;

Assess this change with cachegrind.

Actions #5

Updated by Roland Fischer over 3 years ago

A few ways to figure out how a struct fits into cache lines (64 bytes typically) are...

  • try to wrap your head around struct sizes by looking at code and knowing the compile settings, or
  • use an online compiler such as godbolt or so by extracting the struct you wan to fiddle with - still requires you to figure out the compile settings, or
  • simply instrument the code to get the compiler to tell you during compilation or runtime as it has the correct alignment and packing settings:
    • _Static_assert() on the sizeof(PcapThreadVars) or sizeof(CopyofTrimmedStructYouWantToKnowSizeOf) are your friends if you fancy quick compile errors to figure out sizes
    • log the sizeof(PcapThreadVars) or sizeof(CopyofTrimmedStructYouWantToKnowSizeOf) if you want to run the program to figure out sizes.
Actions #6

Updated by Philippe Antoine 9 months ago

  • Assignee set to Community Ticket
  • Target version set to TBD
Actions

Also available in: Atom PDF