Project

General

Profile

Actions

Optimization #7585

open

af-packet: SOF_TIMESTAMPING_RAW_HARDWARE dangerous default leading to incorrect timestamps

Added by Michael Stone 9 months ago. Updated about 4 hours ago.

Status:
In Review
Priority:
Normal
Assignee:
Target version:
Effort:
Difficulty:
Label:

Description

By default in af_packet mode suricata attempts to set SOF_TIMESTAMPING_RAW_HARDWARE on a monitored interface. Unfortunately, this can mean that the timestamps are coming from an synchronized timer on the NIC.

Background: many modern NICs are designed to support PTP (precision time protocol), and to facilitate that the NIC has its own clock. That clock is kept accurate by some timesource and managed via userspace tools such as the linuxptp project. Regardless of whether PTP is in use, the clock seems to be initialized on boot. Then, the NIC clock is never again synced unless steps are taken to do so. The result is that the NIC clock and the system clock begin to drift apart. How quickly that happens depends on the accuracy of the NIC clock and how long the system runs. Some real-life examples:

a system with about 90 days of uptime and 3 NICs:

# for i in /dev/ptp* ; do phc_ctl $i get ; done ; date
phc_ctl[8147391.663]: clock time is 1741378468.001545494 or Fri Mar  7 20:14:28 2025
phc_ctl[8147391.669]: clock time is 1741378458.824065546 or Fri Mar  7 20:14:18 2025
phc_ctl[8147391.676]: clock time is 1741378495.935860299 or Fri Mar  7 20:14:55 2025
Fri 07 Mar 2025 08:17:01 PM UTC

in this case one interface is not in use and reflects the last system boot time:

# for i in /dev/ptp* ; do phc_ctl $i get ; done ; date
phc_ctl[8146929.952]: clock time is 1741378259.724956768 or Fri Mar  7 20:10:59 2025
phc_ctl[8146929.955]: clock time is 1733231435.818002320 or Tue Dec  3 13:10:35 2024
Fri 07 Mar 2025 08:12:43 PM UTC

20+ minutes slow, and each NIC is significantly different from the others:

# for i in /dev/ptp* ; do phc_ctl $i get ; done ; date
phc_ctl[90128374.727]: clock time is 1741377498.633167475 or Fri Mar  7 14:58:18 2025
phc_ctl[90128374.733]: clock time is 1741377452.185620309 or Fri Mar  7 14:57:32 2025
phc_ctl[90128374.739]: clock time is 1741377355.754238877 or Fri Mar  7 14:55:55 2025
Fri 07 Mar 2025 03:21:54 PM EST

nearly 2 minutes in the future, after 60 days uptime:

# for i in /dev/ptp* ; do phc_ctl $i get ; done ; date
phc_ctl[5243773.229]: clock time is 1741379536.863106904 or Fri Mar  7 20:32:16 2025
phc_ctl[5243773.232]: clock time is 1741379536.865495696 or Fri Mar  7 20:32:16 2025
phc_ctl[5243773.234]: clock time is 1736135658.288802528 or Mon Jan  6 03:54:18 2025
phc_ctl[5243773.236]: clock time is 1741379536.869478312 or Fri Mar  7 20:32:16 2025
Fri Mar  7 08:30:25 PM UTC 2025

Some of these are intel hardware, some broadcom. They include both server-grade and consumer-grade models. Here's how this affects packet capture:

# date ; tcpdump -n -i enp1s0 -j adapter_unsynced 
Fri 07 Mar 2025 08:51:07 PM UTC
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp1s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:51:07.431126 IP [outgoing]
20:48:33.458943 IP [incoming]
20:51:07.506481 IP [outgoing]
20:48:33.539606 IP [incoming]
20:51:07.610049 IP [outgoing]
20:48:33.640774 IP [incoming]
20:51:07.714061 IP [outgoing]

Note that when using raw hardware timestamps, the incoming packets have the wrong time. The "-j adapter_unsynced" option uses SOF_TIMESTAMPING_RAW_HARDWARE. libpcap also supports "-j adapter", which uses SOF_TIMESTAMPING_SYS_HARDWARE but that option is no longer available in linux.

How did this escape notice for so long? I suspect because:
  1. it doesn't become noticeable until the system has been up for a significant amount of time
  2. if the clock runs slow instead of fast the results could be interpreted as a processing delay
  3. it is difficult to identify that the data is wrong
  4. it is hardware and kernel configuration dependent
  5. there is no problem if the system is using PTP to synchronize network interfaces
  6. it only affects af-packet mode
  7. if someone does notice a problem it goes away with a reboot and they chalk it up as a fluke

If eve logs are available with stats the problem can be identified fairly easily because the stat timestamp is based on the system clock rather than the NIC clock:
["2025-03-01T23:32:13.630827+0000","tls"]
["2025-03-01T23:19:21.712747+0000","stats"]
["2025-03-01T23:32:13.662655+0000","tls"]

Testing is possible (assuming the right networking hardware) by simply using date(1) to change the time; the NIC clock will not reflect the change.

It is possible to set the NIC clock using the linuxptp phc_ctl command and the keyword "set"; this must be done on each interface, and must be done periodically as the clocks drift. I am not aware of a tool to synchronize the NIC clock to the system clock rather than the other way around.

So what should be done in suricata? Hardware timestamping should be disabled by default. A configuration option to enable it may be desired in cases where the clock is known to be good, e.g., when the system is synchronizing time using PTP. The manual should clearly reflect the potential problems with hardware timestamps.


Related issues 3 (2 open1 closed)

Related to Suricata - Feature #1954: runtime option/flag to disable hardware timestamp supportResolvedVictor JulienActions
Related to Suricata - Bug #7115: dpdk: timestamping packets through TSC does not yield the same time as kernel timeClosedLukas SismisActions
Related to Suricata - Feature #7117: dpdk: hardware timestamping for packetsAssignedLukas SismisActions
Actions #1

Updated by Michael Stone 9 months ago

By "I am not aware of a tool to synchronize the NIC clock to the system clock rather than the other way around" I mean to say that I'm not aware of a tool that does this on a continuous basis without jumping the time (adjtime), as NTP does to the system clock. The phc_ctl set command will set the NIC clock, but does so atomically.

Actions #2

Updated by Victor Julien 9 months ago

  • Related to Feature #1954: runtime option/flag to disable hardware timestamp support added
Actions #3

Updated by Victor Julien 9 months ago

  • Related to Bug #7115: dpdk: timestamping packets through TSC does not yield the same time as kernel time added
Actions #4

Updated by Victor Julien 9 months ago

Thanks for your report Michael. It looks like a long standing request to make it configurable (#1954) is a good first step. It also sounds quite similar to this DPDK ticket #7115.

@Eric Leblond @Lukas Sismis any thoughts?

Actions #5

Updated by Philippe Antoine 5 months ago

  • Tracker changed from Bug to Optimization
Actions #6

Updated by Jason Taylor 5 months ago

For what it's worth, #1954 was requested based on our observation of what Michael describes here. In our case we observed the issues with with Mellanox NICs.

Actions #7

Updated by Victor Julien 30 days ago

  • Status changed from New to Assigned
  • Target version changed from TBD to 9.0.0-beta1
Actions #8

Updated by Victor Julien 10 days ago

I've added an option to disable the feature as part of #1954. I think we need to consider disabling hwtimestamping by default, but I'd like to understand what perf impact this may have. Also I think we need some documentation on how to use it safely, if this is possible at all.

Actions #9

Updated by Victor Julien 10 days ago

  • Related to Feature #7117: dpdk: hardware timestamping for packets added
Actions #10

Updated by Victor Julien 10 days ago

Playing a bit with the commands it's clear that some setups suffer more from this than others. My worst so far is a x710-T4:

# ethtool -i ens3f1np1 && ethtool -T ens3f1np1 && date && phc_ctl ens3f1np1 get && date
driver: i40e
version: 6.17.0-7-generic
firmware-version: 6.01 0x800035ce 1.1747.0
expansion-rom-version: 
bus-info: 0000:07:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Time stamping parameters for ens3f1np1:
Capabilities:
        hardware-transmit
        software-transmit
        hardware-receive
        software-receive
        software-system-clock
        hardware-raw-clock
Hardware timestamp provider index: 6
Hardware timestamp provider qualifier: Precise (IEEE 1588 quality)
Hardware Transmit Timestamp Modes:
        off
        on
Hardware Receive Filter Modes:
        none
        ptpv1-l4-sync
        ptpv1-l4-delay-req
        ptpv2-l4-event
        ptpv2-l4-sync
        ptpv2-l4-delay-req
        ptpv2-l2-event
        ptpv2-l2-sync
        ptpv2-l2-delay-req
        ptpv2-event
        ptpv2-sync
        ptpv2-delay-req
Sun Nov 30 08:53:38 CET 2025
phc_ctl[167330.904]: clock time is 1764405554.220590486 or Sat Nov 29 09:39:14 2025

Sun Nov 30 08:53:38 CET 2025

Actions #11

Updated by Victor Julien 10 days ago

It also starts drifting immediately after syncing it

root@z420:/home/victor# phc_ctl ens3f1np1 set
phc_ctl[167496.286]: set clock time to 1764489384.009790747 or Sun Nov 30 08:56:24 2025

root@z420:/home/victor# phc_ctl ens3f1np1 get && date
phc_ctl[167504.923]: clock time is 1764489388.328344519 or Sun Nov 30 08:56:28 2025

Sun Nov 30 08:56:32 CET 2025
root@z420:/home/victor# phc_ctl ens3f1np1 get && date
phc_ctl[167514.750]: clock time is 1764489393.241777459 or Sun Nov 30 08:56:33 2025

Sun Nov 30 08:56:42 CET 2025
root@z420:/home/victor# phc_ctl ens3f1np1 get && date
phc_ctl[167520.866]: clock time is 1764489396.299897481 or Sun Nov 30 08:56:36 2025

Sun Nov 30 08:56:48 CET 2025
root@z420:/home/victor# 

Actions #12

Updated by Victor Julien 10 days ago

Also not every NIC/driver that appears to support hwtimestamp allows phc_ctl to control it:

root@z420:/home/victor# ethtool -i ens5f0
driver: ixgbe
version: 6.17.0-7-generic
firmware-version: 0x00012b2c, 1.1197.0
expansion-rom-version: 
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
root@z420:/home/victor# ethtool -T ens5f0
Time stamping parameters for ens5f0:
Capabilities:
        hardware-transmit
        software-transmit
        hardware-receive
        software-receive
        software-system-clock
        hardware-raw-clock
PTP Hardware Clock: none
Hardware Transmit Timestamp Modes:
        off
        on
Hardware Receive Filter Modes:
        none
        ptpv1-l4-sync
        ptpv1-l4-delay-req
        ptpv2-event
root@z420:/home/victor# phc_ctl ens5f0 get
phc_ctl[167674.304]: interface ens5f0 does not have a PHC
root@z420:/home/victor# phc_ctl ens5f0 set
phc_ctl[167678.450]: interface ens5f0 does not have a PHC
root@z420:/home/victor# 

Actions #13

Updated by Victor Julien 10 days ago

Other than the x710 above, the various NIC I have seem to behave reasonably well. Small drifts in all of them though. Would there be negative side effects of adding a job that updated the NIC clock every second or minute or so?

Actions #14

Updated by Victor Julien about 14 hours ago

  • Status changed from Assigned to In Review
  • Assignee changed from OISF Dev to Victor Julien
  • Priority changed from High to Normal
Actions #15

Updated by Victor Julien about 4 hours ago

  • Subject changed from SOF_TIMESTAMPING_RAW_HARDWARE dangerous default leading to incorrect timestamps to af-packet: SOF_TIMESTAMPING_RAW_HARDWARE dangerous default leading to incorrect timestamps

Updated the ticket to reflect af-packet. Hw timestamping issues aren't unique to af-packet but AFAICS in our codebase af-packet was the only capture method setting it up.

Actions

Also available in: Atom PDF