Y2K and Date Problems

Date Problems

What is the Y2K problem? Where did it come from?

What other date problems exist?

How can they be avoided in future?

What are the User Interface issues with date problems?

What are the Software Engineering issues with date problems?

References

W.M.O'Neil, Time and the Calendars, Sydney University Press, 1975.

More than you could ever want to know about the origins of dates, day names and such.

The Origins of the Calendar

Years ago, back in the days of the Roman Republic, around 200 BC, the Roman calendar was in chaos.

The years didn't match up with the seasons. Months were unequal, and there were 13 months in the year:

   1. Martius      7. Septembris
   2. Aprilius     8. Octilius
   3. Maius        9. Novilius
   4. Junius       10.Decilius
   5. Quintilius   11.Januarius
   6. Sextilis     12.Februarius
                   13.Intercalaris

The Solar Year: 365.25 days

Part of the problem is that the solar year is close to 365 and a quarter days in length - a very odd number.

The Babylonians knew the length of the year. They observed a 360-day calendar of 12 months, each of 30 days, plus a 13th "short" month, of 5 days, considered unlucky.

The Egyptians also observed a similar calendar. Many of the civilisations around the Mediterranean at that time had very good astronomers and knew the length of the year to a great degree of accuracy.

But the Romans had a haphazard approach to the calendar, and somehow got out of sync with the seasons.

46 BC: The Year of Confusion

Julius Caesar commissioned a Greek astronomer, Sosigenes, to work out how to fix the Roman calendar. In the year 46 BC, his recommendations were implemented.

The year 46 BC was 445 days in length. Normal counting of months was suspended, and the "fix-up" month, Intercalaris, was removed from the calendar entirely.

The year was to be 365 days in length. Months were to be 30 or 31 days in length, alternating, to make up that number.

A new system of "leap years" was to be implemented. Every four years would have an additional day, inserted after 23rd of Februarius, to account for the extra quarter day.

It became known as the "Year of Confusion", and the "Julian calendar" was to remain in use for over 1600 years.

From BC to AD

Of course, the year 46 BC wasn't known as 46 BC to the Romans, because the era of Christianity had not yet begun.

It was only circa AD 525, after the Roman Empire had adopted Christianity as its official religion, that the calendar years were changed to a new system, reckoned from January 1st, AD 1 (Anno Domini, or the Year of the Lord) as the year of birth of Jesus, and all dates before that as BC, or Before Christ. (There is no year 0 BC! Does this mean the year 2000 is the start or end of millennium? Think about it...)

The Washington Post quotes Senate Banking Committee Chairman Phil Gramm, R-Texas, at a hearing on July 30, 1999, talking about Y2K:

"Well, it seems to me we ought to be encouraged that in the year 1000, they had to add a new digit, and yet no evidence of economic disruption. And then a millennium before, we had dates going down, and then they started going up, and yet no evidence of disruption or chaos in the economy. So if they could do it then, surely we can deal with it now, it seems to me."

Huh?

Not quite 365.25 days

Unfortunately, the year is just short of 365.25 days in length. It's actually closer to 365.2425 days.

In AD 1563, the Council of Trent was convened to address a problem with the calendar - the calendar was now 11 days ahead of the seasons.

In AD 1582, Pope Gregory XIII implemented the solution proposed by Christopher Clavius, a mathematician, in what is now known as the "Gregorian calendar".

The Gregorian Calendar

The solution offered by Christopher Clavius had two main elements:

1. There are too many leap years, because the year is close to 365.24 days in length, so 3 leap years in every 400 are to become normal, non-leap years. The years chosen were multiples of 100 which are not also multiples of 400.

So, 1700, 1800, 1900, 2100, 2200, 2300, 2500 etc are not leap years.

2. The 11 extra days are to removed from the calendar.

In AD 1582, Thursday October 4 was followed by Friday October 15. This was followed fairly quickly by most Catholic countries, but Protestant countries were slower to agree to the change.

Britain's legacy

Britain was one of the slowest of countries to implement the new calendar.

In AD 1752, Great Britain changed their calendar, in the following ways:

1. The year was now to start on January 1 not March 25 as had been the practice up to and including AD 1752. So, March 24th 1751 was followed by March 25th 1752. (This was the practice because of the equinox and the seasons)

2. The year 1752 began on March 25 as usual, but ended on December 31st (early)

3. September 2nd 1752 was followed by September 14th 1752

Protesters at the time chanted outside Parliament "Give us back our 11 days!"

Try using the Unix command cal 1752 - you'll see!

The result

The change carried through to all countries which were part of the British Empire, including the American colonies.

Australia had not yet been conquered by the British, and so the indigenous peoples were blissfully unaware of these peculiarly European problems.

Most countries use the Gregorian system of leap years to keep their secular calendars correct, but religious and cultural events are often calculated using different systems. Easter is one example.

The calendar is still not perfectly precise, but it's close.

The Problem: Computers

Back in the days of the Roman Republic, if Julius Caesar pronounced that the year was to be 445 days in length, there may have been complaints, and debate in the Senate, but ultimately, it was done.

When Pope Gregory made the changes later on, he had Papal authority to implement those changes, and even Protestant countries eventually had to listen.

These days, if changes needed to be made, it would not be so easy. There are so many businesses using so many computers, using so much software which relies on date calculations, that fixing a problem like Y2K is a big issue.

Problem: 2-digit years

One of the problems is the use of 2-digit years, instead of 4 digits.

The date 1/1/97 could mean 1/1/1997, or 1/1/2097 or some other date. If the software assumes the 19xx, then the year 2000 will be represented internally as "00", which might print as "1900".

This might be a small matter of incorrect printouts, or a larger problem involving calculation of interest in a bank account application.

Check: Look at 1/1/2000 to see how the software copes.

Solution: re-write all the software which uses that system, and fix all existing data bases to use 4-digit years.

Problem: Leap Year Calculations

Another problem is that leap year calculations might be incorrect.

Here's a correct implementation (in C) of a leap year computation. The parameter year is assumed to be the full 4-digit year as an integer.

   int is_leap_year (int year)   /* correct */
   {
      if (year % 400 == 0)  /* multiple of 400 true */
         return 1;
      if (year % 100 == 0)  /* multiple of 100 false */
         return 0;
      if (year % 4 == 0)    /* multiple of 4 true */
         return 1;
      return 0;             /* false */
   }

The modulus operator % calculates the remainder after dividing the first number by the second. If the remainder is zero, the first number must be an exact multiple of the second number.

Problem: Leap Year Wrong

Incorrect calculations of leap years may abound in software and hardware. Here's an incorrect method:

   int is_leap_year (int year)   /* incorrect!! */
   {
      if (year % 4 == 0)
         return 1;
   }

This calculation is wrong because it assumes every fourth year is a leap year.

It's a time bomb waiting to go off. It's deceptive too, because it'll actually work for the year 2000, since 2000 is a leap year (multiple of 400).

This program will fail in the year 2100.

Check: February 29th 2100 should not exist

Solution: look for "% 4" using a text editor, and rewrite it.

Problem: Leap Year Wrong

Here's another bad leap year calculation:

   int is_leap_year (int year)   /* incorrect!! */
   {
      if (year % 100 == 0)
         return 0;
      if (year % 4 == 0)
         return 1;
      return 0;
   }

This implementation says that all multiples of 100 are not leap years. So the year 2000 will be assumed to be a normal, non-leap year.

Check: February 29th 2000 should exist and be a Tuesday. March 1st 2000 should be a Wednesday.

Solution: test for it and fix it, sooner rather than later.

Problem: Financial Year

It's possible the financial year (which is offset from the calendar year in many countries) will cause a similar problem.

If the software make calculations at the end of the financial year, bugs may not become apparent until that time.

Check: (Australia) June 30 to July 1 transition

Problem: Pay Day

Many businesses have dedicated computers for their payroll systems. Often these computers are only activated on pay-day to print out the cheques for employees.

The following days may reveal bugs in payroll software:

Problem: Year 2001 or Year 2050

It may be that the software stores a 2-digit offset from some number other than zero.

These problems could start to hit from January 1, 2001 to anywhere through the 21st century.

Example: fixing Y2K might cause another bug later on.

   int fix_year (int two_digit_year) /* bug!! */
   {
      if (two_digit_year > 49)
         return 1900 + two_digit_year;
      else
         return 2000 + two_digit_year;
   }

Question: What happens in the year 2050?

Solution: don't fix a bug with a bug!

Problem: Databases

There are many potential problems in software, and in the hardware clocks inside computers, but a major problem lies in the data which has already been collected by computers, in the form of databases, such as bank account details.

Many existing databases may only keep the year as 2 digits. When upgrading the software, a decision has to be made as to what to do with the old data.

A complete fix is to write a conversion program which updates all of your data to a 4-digit system. Sometimes this is not practicable.

Temporary solutions, such as special code to read the old databases, can muddy the waters and lead to future problems.

No good solution, except to do better in future, and get it right first time.

Problem: Year 2038

There is a well-known problem, which will occur in the year 2038, in some Unix systems.

The C-library interface to the hardware clock uses a function called time, which returns an integer, the number of seconds elapsed since January 1, 1970.

Integers in Unix are typically signed 32-bit quantities, which have 2 to the power of 32 different values. Because they are signed, there are really only 2 to the power of 31 non-negative represtable numbers.

It so happens that this number of seconds equals a little over 68 years. Now, 1970 + 68 = 2038, so in January 2038 some system clocks may read 1970, or even 1902 (1970 - 68).

Solution: migrate Unix-based operating systems to 64-bit integers by the year 2038.

Problem: 1900 offsets

Another problem in Unix: the struct tm structure used in the C standard library time functions, contains an integer known as tm_year which records the "years since 1900".

This is an offset. It should have just be the year, but the designers thought they'd be clever and make the user add 1900 to the number to get the actual year. Silly people.

A buggy program might print the year thus:

   printf("Year 19%d\n", t.tm_year); /* wrong!! */

Question: what happens in the year 2000?

Watch out for overflowing string buffers and memory problems!

Problem: Year 2100

There are a number of problems which may come up in the year 2100:

Firstly, programmers and hardware designers will have forgotten all about Y2K. People will once again be assuming two digits is 'enough'.

Secondly, the year 2100 is different to 2000. It is not a leap year. Poor code which doesn't correctly calculate which years are leap years will be exposed in 2100. (And may cause programmers to write code which will fail in 2400, which is a leap year).

Homework: Try setting your home computer's system clock to the year 2100. Can it be done? Does February 29 exist? (It shouldn't.)

Problem: Y10K

This one's a long way down the track, but it's possible.

If you assume a year is 4-digits, what happens when the year changes to 5 digits?

Remember what the COBOL programmer writing applications for the bank said in 1962 - "two digits should be enough for the foreseeable future"

Lesson: the software you write can outlast your employment, and even your lifetime. Make the next person to maintain your code pleasantly surprised.

Problem: Date Abbreviations

Americans abbreviate dates differently to Australians or Britons:

The date 1/3 might mean the 1st of March, or the 3rd of January.

This is an issue with input as well as display of dates.

Solution: when displaying information, use full or abbreviated month names, not numbers. E.g. "Mar 1" or "1 March" This is easy to do: it's just a table lookup.

Solution 2: when inputting information, provide a drop-down menu or selectable list of months, rather than allowing the user to type a number for the month.

Problem: Year Abbreviations

Avoid allowing the abbreviation of the year; it just leads to chaos.

Solution: if a year must be typed in by the user, simply refuse to accept a year which has fewer than 4 digits. Store the full year in any database or data structure you use. Don't bother trying to save that "extra 2 digits" - there's no point. Use integers to store date, instead of strings, because strings run out of space a lot faster than integers.

Solution 2: when printing dates, use the largest-to-smallest format. E.g. 1994/7/16 (July 16, 1994)

This has the advantage of not being culture-specific, unlike American vs British date abbreviations.

A Machine-Readable Date Representation

There is a proposed standard for representing dates in data files and for transmission between programs:

   1994
   1994-07-16
   1994-07-16T19:20+01:00
   1994-07-16T19:20:30+01:00
   1994-07-16T19:20:30.45+01:00

The year may never be abbreviated, but the above date combinations are allowed.

The T marks the start of the time component. Note that fractions of a second are allowed.

The +01:00 is offset from Universal Time, and may be negative for those time zones west of Greenwich. Instead of an offset, the letter Z may be used to represent Universal Time.

This format avoids a lot of problems with storing dates inside databases. (Still has problems with colonies on Mars)

User Interface Lessons

Internationalisation of date-display is a noble goal, as long as it doesn't lead to ambiguity.

Don't confuse display with storage of dates!

Validate input whenever you can. Often you can stop a problem right there.

Check your leap-year calculations.

Don't accept partial or incorrect dates. Don't make your program "guess" what the correct values "should be". Alert the user and make them re-input the date.

Use drop-down lists for months, and use month names in input or for display. It makes it a lot harder to go wrong.

Software Engineering Lessons

Don't 'optimise' a solution if it makes it incorrect.
(e.g. Don't reduce years to two digits just to save an insignificant amount of memory.)

Don't use valid data to signal an error!
(e.g. '00' and '99' are valid years in a two-digit system, why assume otherwise?) Better to use a value which is 'impossible' or 'out of range'. Better still is to use a separate variable entirely to indicate errors, thus avoiding the problem of finding an 'impossible' value.

Don't make systems more complicated than they need to be.
(e.g. Why did the designers of C force programmers to add 1900 to tm_year ? That just introduces bugs.)

Software Engineering Lessons (continued)

Research a problem before writing mission-critical code.
(e.g. Don't assume you know the definition of a leap year, you might get it wrong. Research it, or use someone else's function to calculate it, providing they've shown their software is correct.)

Avoid introducing arbitrary limits into your programs.
(e.g. why limit yourself to two digits, when four gives you the ability to handle 10,000 years, and 32-bit integers allows your program to handle up to the year 4,000,000,000.)

Don't fix a bug with another bug.
(e.g. Trying to make your program 'smart' so that years 50-99 are 1950-1999, while years 00-49 are 2000-2049 actually introduces a new, nastier bug into your program: the Y2050 bug. Hard to find, and each program may have a different bug to content with, Y2030, Y2070, etc. Overall effect: computers become less and less trustworthy for date calculations as the century continues.)

Test your program works.
(e.g. My old Pentium had a clock which could be set from 1995-2099. Looked Y2K compatible. BUT, if you change the year to anything after 1999, it would reset the value to 2094 at reboot time. The engineers had 'fixed' the bug by changing the code, but not actually testing it. Sigh.)