Problematic second - SarvenDev
๐ Abstract
The article discusses the challenges and complexities involved in handling time in IT systems, with a focus on the issues surrounding leap seconds. It covers the historical evolution of the definition of a second, the introduction of leap seconds to compensate for variations in the Earth's rotation, and the various problems that can arise from the addition of these extra seconds.
๐ Q&A
[01] Time Measurement and Synchronization in IT Systems
1. What are the main problems with time that exist in different IT systems?
- Accuracy of measuring time and synchronizing time between machines in distributed systems
2. How is time defined and measured in IT systems?
- Initially, a second was defined as 1/86400 of a day, then changed to 1/31,556,925.9747 of a tropical year in 1960
- In 1967, the second was defined based on the transition between two energy levels of the cesium-133 atom
- Atomic clocks are the most accurate clocks, but there are still differences between the time measured by atomic clocks and the actual time the Earth revolves around the Sun
- Leap seconds are added to compensate for these differences, usually on the last day of June or December
3. What are the different types of clocks in computers?
- Real-time clocks: Return the actual date and time, but can be affected by NTP synchronization issues
- Monotonic clocks: More precise, providing values in microseconds or more, and are not affected by NTP adjustments
4. What are the consequences of poor clock synchronization or out-of-sync clocks?
- Can lead to data loss and corruption, as the system may operate incorrectly and unnoticed
- In replication with multiple leaders, the difference in clock synchronization can lead to the wrong order of writes, resulting in incorrect data
- Logical clocks based on incrementing counters should be used instead of relying solely on NTP synchronization
[02] Leap Seconds and Their Impact
1. What is the significance of leap seconds?
- Leap seconds are added to compensate for the difference between the time measured by atomic clocks and the actual time the Earth revolves around the Sun
- They have been added 27 times in the past 52 years, averaging once every two years
2. What are the problems caused by the addition of leap seconds?
- Various software bugs and performance issues have been observed, such as the "hrtimer" module in the Linux kernel going "crazy" and waking up all applications, leading to high CPU usage and system crashes
- The issues have affected databases, application servers, and even caused spikes in energy consumption in data centers
3. How have different systems handled the addition of leap seconds?
- The simplest approach is to announce the leap second via NTP servers, resulting in the same time appearing twice
- Windows ignores the leap second and jumps to the correct time on the next synchronization
- Google's "leap smear" approach gradually adds milliseconds throughout the day instead of adding a whole second at once
4. What is the future plan for handling leap seconds?
- The decision has been made to abolish the leap second by or before 2035, allowing UTC and UT1 to drift apart until a better method for accounting for lost time is developed
[03] Other Time-related Issues in IT
1. What was the "Y2K" problem, and how was it addressed?
- In the past, years were often stored using only the last two digits, leading to problems when determining the validity of things like credit cards beyond the year 2000
- This required significant work to update software and bring it up to scratch, at a considerable cost
2. What is the "2038 problem," and how will it be addressed?
- On Unix-based systems, time is stored as the number of seconds from January 1, 1970, which is stored in a 32-bit signed integer with a maximum value of 2,147,483,647, corresponding to January 19, 2038
- The solution is to switch to using 64-bit integers, which will significantly increase the range and delay the next such problem until the year 292,277,026,596