Worklog 2015-01-13

Rustlog

I've finally got the initial design of Chrono 0.2 working. It took so much time, partially because I'm working on dozens of other projects including Rust libraries. It solves one of the major annoyance with Chrono, so I'm glad that this new design is promising.

Time zones in Chrono 0.1

As always, I start by pointing to Erik Naggum's excellent essay about date and time. (Yes, I'm terrible at story telling and I won't say a lot about that.) Most importantly, the core aspect of time zone is that it is ultimately a political creation rather than a natural necessity. This complicates lots of things, though Author David Olson and others have done a tremendously important work in this regard.

There are several problems with time zones:

  1. There is no reliable way to handle the local date in the future.
  2. There can exist a local date which occurred in two or more instants.
  3. There can exist a local date which never has been occurred.
  4. The conversion process itself is seriously annoying and easy to make a mistake.

(If you are interested in the timekeeping, you'll realize that this list equally applies to leap seconds. I had to deal with them in Chrono as well, and chose to make them invisible to the normal usage.)

In fact, Chrono's original Offset design does explicitly account for these problems. In Chrono, the local date is a concept only meaningful to accessors and formatting routines, eliminating the very source of 1. The possibility of 2 and 3 is handled via the LocalResult enum, while 4 is handled via... delegating everything to the Offset. This decision is partly because we would only have a handful number of Offset implementations, so we have to implement them anyway. In the end, we had something like this:

pub trait Offset: Clone + Show {
    fn local_minus_utc(&self) -> Duration;

    fn from_local_date(&self, local: &NaiveDate) -> LocalResult<Date<Self>>;
    fn from_local_time(&self, local: &NaiveTime) -> LocalResult<Time<Self>>;
    fn from_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<DateTime<Self>>;

    fn to_local_date(&self, utc: &NaiveDate) -> NaiveDate;
    fn to_local_time(&self, utc: &NaiveTime) -> NaiveTime;
    fn to_local_datetime(&self, utc: &NaiveDateTime) -> NaiveDateTime;

    fn ymd(&self, year: i32, month: u32, day: u32) -> Date<Self> { ... }
    // other constructors follow
}

pub struct DateTime<Off> {
    datetime: NaiveDateTime,
    offset: Off
}

// Date and Time follows

This sounds good. You can put the offset data into the timezone-aware DateTime, and use it to convert to the local date (to_local_date). In the converse, DateTime has to be created from Offset so that it converts to the internal representation in UTC (from_local_date).

But you might wonder: why is local_minus_utc separate from to_local_datetime? The latter can be implemented via the former, right? Yes! In the current implementation the latter is redundant. And this redundancy suggests a bigger problem.

Originally, to_local_datetime was to be used in the absence of the exact offset to UTC. This alone is enough for converting UTC to the local time, and it is still used in the Offset conversion where the original value is converted to UTC then to the target time zone. But this is inefficient, especially if we have a large table of zone transitions. Therefore we have local_minus_utc for caching the calculated current offset. The caller was expected to call local_minus_utc for converting the current value to UTC and to_local_datetime etc. for other cases. For the fixed-offset time zones like UTC these methods will be largely a simple arithmetic, so we only pay what we use.

In the reality this scheme didn't work well. Local was an example of the glaring problem; since it has to cache the value, it should have a field, but that meant that we need to create a Local instance every time we convert to the local date! This defies the simple interface like dt.with_offset(UTC), and since such Local instance doesn't know about the exact offset, we have a separate flag indicating the offset is correct or not. (In Rust, this would translate to Option<FixedOffset>.) This even breaks the original premise of "only paying what we actually use".

After I realized this problem, I'd tried several solutions and spectacularly failed. It was clear that we really need two kinds of types, but I was not sure how to do that.

Time zones in Chrono 0.2

Associated types, originally proposed in RFC #195, were a game changer for this problem. They allow for two types to be smoothly connected in the compile time. The resulting design is as follows:

pub trait Offset: Sized + Clone + fmt::Show {
    fn local_minus_utc(&self) -> Duration;
}

pub trait TimeZone: Sized {
    type Offset: Offset;

    fn from_offset(offset: &Self::Offset) -> Self;

    fn offset_from_local_date(&self, local: &NaiveDate) -> LocalResult<Self::Offset>;
    fn offset_from_local_time(&self, local: &NaiveTime) -> LocalResult<Self::Offset>;
    fn offset_from_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<Self::Offset>;

    fn offset_from_utc_date(&self, utc: &NaiveDate) -> Self::Offset;
    fn offset_from_utc_time(&self, utc: &NaiveTime) -> Self::Offset;
    fn offset_from_utc_datetime(&self, utc: &NaiveDateTime) -> Self::Offset;

    // helpers and constructors follow
}

pub struct DateTime<Tz: TimeZone> {
    datetime: NaiveDateTime,
    offset: Tz::Offset,
}

// Date and Time follows

This new design directly shows that we are dealing with two different types! TimeZone creates an Offset which is a storage-oriented type, which can be converted back to the TimeZone via from_offset. TimeZone is used for creating date and time values, while Offset is used for converting to the local time. I originally tried to avoid separate trait for local_minus_utc, but ultimately abandoned that plan to make dt.offset().local_minus_utc() possible.

There are a set of TimeZones and Offsets available:

TimeZone instance Offset instance
UTC UTC
FixedOffset FixedOffset
Local FixedOffset
TzFile (*) TzFileOffset (*)
TzRule (*) TzRuleOffset (*)

(* Some instances are under the development.)

One can note that some time zones are their own offsets, as they do not have an additional data (i.e. cache) for the storage, and Local reuses the FixedOffset as Local itself has no state. Other time zones have to cache its offset and the transition data, hence separate offset types.

TzFile and TzRule is a part of tzdata support; they will be typically used as a reference-counted version, such as Rc<TzFile> or Arc<TzRule>, to avoid the deep copying (which is common in Chrono). TzRule implements the rule string in POSIX TZ environment variable, which is not that useful by its own (since we can use Local anyway), but it's TzFile's solution to the aforementioned problem 1: the future zone transition is encoded as a form of TzRule, and we need to implement it.

At the end this new design is quite promising, but even after #![feature(associated_types)] is gone associated types are still cutting edge features. I had to disable debuginfo due to the Rust issue #21010, for example. Hopefully though, this shall not affect the validity of this design.