Bug #7585
openSOF_TIMESTAMPING_RAW_HARDWARE dangerous default leading to incorrect timestamps
Description
By default in af_packet mode suricata attempts to set SOF_TIMESTAMPING_RAW_HARDWARE on a monitored interface. Unfortunately, this can mean that the timestamps are coming from an synchronized timer on the NIC.
Background: many modern NICs are designed to support PTP (precision time protocol), and to facilitate that the NIC has its own clock. That clock is kept accurate by some timesource and managed via userspace tools such as the linuxptp project. Regardless of whether PTP is in use, the clock seems to be initialized on boot. Then, the NIC clock is never again synced unless steps are taken to do so. The result is that the NIC clock and the system clock begin to drift apart. How quickly that happens depends on the accuracy of the NIC clock and how long the system runs. Some real-life examples:
a system with about 90 days of uptime and 3 NICs:
# for i in /dev/ptp* ; do phc_ctl $i get ; done ; date phc_ctl[8147391.663]: clock time is 1741378468.001545494 or Fri Mar 7 20:14:28 2025 phc_ctl[8147391.669]: clock time is 1741378458.824065546 or Fri Mar 7 20:14:18 2025 phc_ctl[8147391.676]: clock time is 1741378495.935860299 or Fri Mar 7 20:14:55 2025 Fri 07 Mar 2025 08:17:01 PM UTC
in this case one interface is not in use and reflects the last system boot time:
# for i in /dev/ptp* ; do phc_ctl $i get ; done ; date phc_ctl[8146929.952]: clock time is 1741378259.724956768 or Fri Mar 7 20:10:59 2025 phc_ctl[8146929.955]: clock time is 1733231435.818002320 or Tue Dec 3 13:10:35 2024 Fri 07 Mar 2025 08:12:43 PM UTC
20+ minutes slow, and each NIC is significantly different from the others:
# for i in /dev/ptp* ; do phc_ctl $i get ; done ; date phc_ctl[90128374.727]: clock time is 1741377498.633167475 or Fri Mar 7 14:58:18 2025 phc_ctl[90128374.733]: clock time is 1741377452.185620309 or Fri Mar 7 14:57:32 2025 phc_ctl[90128374.739]: clock time is 1741377355.754238877 or Fri Mar 7 14:55:55 2025 Fri 07 Mar 2025 03:21:54 PM EST
nearly 2 minutes in the future, after 60 days uptime:
# for i in /dev/ptp* ; do phc_ctl $i get ; done ; date phc_ctl[5243773.229]: clock time is 1741379536.863106904 or Fri Mar 7 20:32:16 2025 phc_ctl[5243773.232]: clock time is 1741379536.865495696 or Fri Mar 7 20:32:16 2025 phc_ctl[5243773.234]: clock time is 1736135658.288802528 or Mon Jan 6 03:54:18 2025 phc_ctl[5243773.236]: clock time is 1741379536.869478312 or Fri Mar 7 20:32:16 2025 Fri Mar 7 08:30:25 PM UTC 2025
Some of these are intel hardware, some broadcom. They include both server-grade and consumer-grade models. Here's how this affects packet capture:
# date ; tcpdump -n -i enp1s0 -j adapter_unsynced Fri 07 Mar 2025 08:51:07 PM UTC tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on enp1s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 20:51:07.431126 IP [outgoing] 20:48:33.458943 IP [incoming] 20:51:07.506481 IP [outgoing] 20:48:33.539606 IP [incoming] 20:51:07.610049 IP [outgoing] 20:48:33.640774 IP [incoming] 20:51:07.714061 IP [outgoing]
Note that when using raw hardware timestamps, the incoming packets have the wrong time. The "-j adapter_unsynced" option uses SOF_TIMESTAMPING_RAW_HARDWARE. libpcap also supports "-j adapter", which uses SOF_TIMESTAMPING_SYS_HARDWARE but that option is no longer available in linux.
How did this escape notice for so long? I suspect because:- it doesn't become noticeable until the system has been up for a significant amount of time
- if the clock runs slow instead of fast the results could be interpreted as a processing delay
- it is difficult to identify that the data is wrong
- it is hardware and kernel configuration dependent
- there is no problem if the system is using PTP to synchronize network interfaces
- it only affects af-packet mode
- if someone does notice a problem it goes away with a reboot and they chalk it up as a fluke
If eve logs are available with stats the problem can be identified fairly easily because the stat timestamp is based on the system clock rather than the NIC clock:
["2025-03-01T23:32:13.630827+0000","tls"]
["2025-03-01T23:19:21.712747+0000","stats"]
["2025-03-01T23:32:13.662655+0000","tls"]
Testing is possible (assuming the right networking hardware) by simply using date(1) to change the time; the NIC clock will not reflect the change.
It is possible to set the NIC clock using the linuxptp phc_ctl command and the keyword "set"; this must be done on each interface, and must be done periodically as the clocks drift. I am not aware of a tool to synchronize the NIC clock to the system clock rather than the other way around.
So what should be done in suricata? Hardware timestamping should be disabled by default. A configuration option to enable it may be desired in cases where the clock is known to be good, e.g., when the system is synchronizing time using PTP. The manual should clearly reflect the potential problems with hardware timestamps.
Updated by Michael Stone 24 days ago
By "I am not aware of a tool to synchronize the NIC clock to the system clock rather than the other way around" I mean to say that I'm not aware of a tool that does this on a continuous basis without jumping the time (adjtime), as NTP does to the system clock. The phc_ctl set command will set the NIC clock, but does so atomically.
Updated by Victor Julien 24 days ago
- Related to Feature #1954: runtime option/flag to disable hardware timestamp support added
Updated by Victor Julien 24 days ago
- Related to Bug #7115: dpdk: timestamping packets through TSC does not yield the same time as kernel time added
Updated by Victor Julien 24 days ago
Thanks for your report Michael. It looks like a long standing request to make it configurable (#1954) is a good first step. It also sounds quite similar to this DPDK ticket #7115.
@Eric Leblond @Lukas Sismis any thoughts?