Clock Synchronization

Getting the right time

Paul Krzyzanowski

September 26, 2022 (Updated September 28, 2023)

Goal: Synchronize time across multiple computers.

Introduction

In most environments, we would like our computers to know and track the current time of day. By this, we mean UTC time, Coordinated Universal Time (or Temps Universel Coordonné), formerly called Greenwich Mean Time1. This is the international standard time of day and we refer top it as a reference clock. From this, we can present time in the user’s local time zone and adjust for daylight saving time. Note that clients or users accessing a computer could be in different time zones. We want a consistent time so we can make sense of timestamps on file data, mail messages, databases, and system logs. These timestamps can be used for software development environments (know what still needs to be compiled based on file modification times), versioning tools, and database queries.

Keeping track of time accurately is difficult. No two clocks tick in perfect synchrony with each other. Quartz oscillators, which drive the timekeeping mechanisms of clock circuits, are not consistent over time, temperature, and pressure and no two oscillate at exactly the same rate. Atomic clocks are great but not practical to install in the vast majority of computers and network devices.

Because a clock doesn’t tick at the exact expected frequency, its accuracy will change over time. The rate at which the clocks are drifting is the clock drift. The difference between two clocks at any given instant is the clock offset or skew. The short-term variation of either network delays or clock frequencies is referred to as jitter. Real-time clocks on PCs may have a clock drift from around 1 microsecond per second or 0.6 seconds per week but, in many cases, it several seconds per day. Some systems, such as the Apple Watch, use a temperature-controlled crystal oscillator to reduce the effects of drift.

Compensating for clock drift

We can adjust a clock that drifted simply by changing the value of a system’s clock to reflect that of UTC time. However, we do not want to provide the illusion of time moving backward to any process that is observing the clock. A sudden backward leap in time could confuse automated build environments, versioning checks, and event logs.

A linear drift compensation function adjusts the rate at which time is measured on a computer (e.g., number of ticks that make up a second). It effectively slows the clock down or speeds it up by a constant rate. The function may be applied in two stages: first, to speed up or slow down the clock more aggressively to bring it in sync to the true time relatively quickly; and secondly, to then apply a compensation function that will attempt to keep the time ticking at a rate that more closely approximates a perfect clock.

For example, suppose you observed that your clock developed a skew of 10 seconds (10000 ms) over the course of a week. That tells you that the clock was measuring time faster than the reference clock and needs to be set back and effectively made to tick slower. 10 seconds per week equates to having to make each second 16.52 microseconds [16.52 µs = 10,000,000 µs ÷ [7 days × 24 hours/day × 60 minutes/hour × 60 seconds/minute)] slower for the clock to come closer to keeping accurate time. Before we do that, though, we may choose to slow the clock down more aggressively to match the reference time. If we want to have the clock slow down to match the reference clock over a 24-hour period, the user and programs will still perceive a passage of 86,400 seconds (24 hours × 60 minutes/hour × 60 seconds/minute) but each second will be adjusted to be approximately 999,884 µs instead of 1,000,000 µs. We estimate this by taking the ratio of the period of time minus 10 seconds to the full period: (86,400–10) & div; 86,400. After the 24-hour period elapses, we estimate that the clock is closer to the reference clock and then adjust for the detected drift. Clock frequencies and times are measured in discrete values, so we hope to adjust the clock’s tick closer to that of a reference clock but will still need to recalibrate periodically.

Setting the time: Cristian’s algorithm

Most computers get the time by asking another system that is known to have a more accurate time; for instance, one connected to a GPS receiver. The problem with asking a time server for the time is that we now introduce network and processing delays: it takes time for the message to reach the server, be delivered to the process, and for the return message to be generated, sent to the client, and received by the process.

Cristian’s algorithm attempts to remove delays of accessing a time service by assuming that delays to the server and back are the same. It sets the time on a client to the time returned by the server plus an offset that is one-half of the transit time between the request and response messages: Tclient = Tserver + ½(Treceived - Tsent).

Because there could be unpredictable delays in either direction (the message might take longer to arrive, or the operating system might delay scheduling the server process to run, or the return message might take longer to arrive), we expect that the return time will have a degree of uncertainty.

Cristian’s algorithm also allows one to compute the maximum error of the new time stamp. The error is ± ½[(round-trip time) - (best-case round-trip time)]. Errors are additive. If you incur an error of ±50 msec and the server’s clock source has an error of ±80 msec, your clock’s error is now ±130 msec.

No good clock: Berkeley algorithm

The Berkeley algorithm does not assume the presence of a server with an accurate time (i.e., one that keeps track of UTC time). Instead, one system is chosen to act as a leader (coordinator). It requests the time from all systems in the group (including itself) and computes a fault-tolerant average. Computing a fault-tolerant average requires selecting the largest subset of time values that don’t differ from each other by some value. The algorithm will be configured with some maximum variance. An arithmetic average of those time values is then computed. The algorithm then sends each machine an offset by which to adjust its clock.

The Network Time Protocol (NTP)

The Network Time Protocol, NTP, was created to allow a large set of machines to synchronize their clocks. A set of computers act as time servers. This collection of systems is known as the synchronization subnet. The subnet is hierarchical, with the time server’s stratum defined as the number of hops it is from a machine that synchronizes from a direct time source. Machines that are directly connected to a time source (e.g., a GPS receiver) are at stratum 0. Machines that synchronize from a system at stratum 0 are at stratum one, and so on.

NTP uses Cristian’s algorithm to set the time. The formula used by NTP is time_offset = ½ (T2 - T1 + T3 - T4) where

T1 is the time the message left the client,
T2 is the time it arrived at the server,
T3 is the time the response left the server,
and T4 is the time that it arrived at the client.

If we let TS = ½(T2 + T3) then we arrive at Cristian’s formula.

NTP encourages clients to try several servers. For each server contacted, the protocol computes

Offset:
The difference between the client’s clock and the server’s. This is how much the client needs to offset its clock to set it to the server’s time.
Delay:
This is an estimate of the time spent sending or receiving the message. It is one half of the round-trip time minus the estimate of time spent on the server.
Jitter:
Jitter measures the variation in delay among multiple messages to the server. It gives us an idea of how consistent the latency is between the client and server.
Dispersion:
Dispersion is the estimated maximum error for the computed offset. It takes into account the root delay (total delay, not just to the server but the delay from the server to the ultimate time source), estimated server clock drift, and jitter.

Given the choice of several NTP servers, NTP picks the server with the lowest dispersion. That is, the server from which it can get the most consistent and accurate time. If there is a tie, it then picks one with the lowest stratum; that is, one closest to the master time source. Note that this may result having a client synchronize from a higher-stratum server even if it can contact a lower-stratum one (one that is closer to the time source). This may happen if the client can access that server more reliably and with less delay and jitter than a “better” server.

The Simple Network Time Protocol (SNTP)

NTP takes into account the quality of the timestamps it receives. To handle the challenges of synchronizing time over potentially unstable networks, NTP also includes algorithms that can filter and mitigate the effects of network jitter and varying delays. This makes NTP very accurate, capable of achieving precision in the order of milliseconds or even better under favorable conditions.

SNTP, the Simple Network Time Protocol is the simplified cousin of NTP. It is not a new protocol but rather a simplified application of NTP. It foregoes many of the intricate features of NTP, which means it doesn’t have the algorithms to filter out network inconsistencies or to pick the best time source. Consequently, SNTP may not be as accurate as NTP, especially when operating over the public internet. Another distinction is in how each protocol deals with problems. NTP has mechanisms in place to manage unreliable time sources. If one source starts to falter, NTP can switch to another. SNTP, being simpler, often leans on a single time source and lacks the built-in redundancies of NTP to address inconsistencies.

NTP is the go-to for systems demanding high accuracy and stability, like servers and certain networking equipment. SNTP, with its lower computational demands, finds its niche in simpler devices or scenarios that don’t demand the precision of NTP. This might include certain embedded systems, devices in the realm of the Internet of Things, or any application that can make do with a basic level of time synchronization. Most PCs use run SNTP

Precision Time Protocol (PTP)

The Precision Time Protocol (PTP), defined by the IEEE 1588 standard, is designed to provide extremely precise time synchronization for devices in a network. PTP was designed with different goals than NTP. While NTP was designed for synchronizing time with machines across the Internet and remote servers, PTP was designed for local area networks: networks with low jitter and low latency. PTP is intended to achieve sub-microsecond precision among cooperating machines.

Master-Slave Architecture

PTP operates on a master-slave model. In a PTP-enabled network, one device acts as the master clock, while other devices act as slaves. Slaves synchronize their clocks to the master through a series of message exchanges, which allow them to account for transmission delays and thereby achieve precise synchronization.

Clock synchronization is initiated by a PTP master (unlike NTP, where the client host initiates the sync). The master announces its time to clients via a sync message. If a client is then interested in synchronizing its clock, it must communicate back to the server by a delay request message and note the time it was sent. The server sends back a delay response message containing the time of arrival of the delay request message. With this data, and the assumption that uplink and downlink latency is the same, the client can compute the server’s timestamp with an adjustment for the network transit delay.

Hardware Support

One of the reasons PTP can achieve such high accuracy is its reliance on hardware support. While software implementations of PTP exist, for the highest levels of precision, hardware timestamping at the level of the network interface cards (NICs) and switches is often used. In some environments, PTP can be 10,000x more precise than NTP.

Flexibility

PTP allows for the possibility of using multiple master clocks in a hierarchical fashion to ensure redundancy and resilience. This means that if one master clock fails, another can take over its role, ensuring uninterrupted precise time synchronization.

While PTP can be used in a variety of networking contexts, it has found specific applications in industries like telecommunications, power utilities, and advanced manufacturing.

References


  1. The abbreviation UTC is a compromise between the English Coordinated Universal Time and the French Temps Universel Coordonné.  ↩︎

Last modified November 9, 2023.
recycled pixels