Long time no see. My daily job and other various distractions prevented me from doing a continuous work like blogging. Well, I finally finished Chrono 0.2 up so I have something to write down now.
In the previous post about Chrono,
I discussed about a new time zone and offset handling of Chrono 0.2.
This alone is a big change,
but another big change in Chrono 0.2 is
a new formatting and parsing API.
Chrono 0.1 already had a rudimentary formatting API via format
method,
but it didn't have a parsing API
which was another pain point besides from local date/time handling.
Chrono 0.2 has three different pieces of new APIs redesigned for formatting and parsing:
- Formatting syntax representation ("items") and parsing
- Formatting with items
- Parsing with items
Altogether they form an advanced formatting facility in Chrono 0.2. I'll try to briefly discuss their designs and justifications.
Formatting Items
In Chrono, a formatting item is a unit of formatting or parsing.
For example a strftime
-like format string %Y-%m-%d
has
five different formatting items:
%Y
, -
, %m
, -
and %d
.
Chrono decouples a formatting syntax from
the actual meaning of formatting items,
so they have the following (somewhat verbose) internal representations:
[Item::Numeric(Numeric::Year, Pad::Zero),
Item::Literal("-"),
Item::Numeric(Numeric::Month, Pad::Zero),
Item::Literal("-"),
Item::Numeric(Numeric::Day, Pad::Zero)]
This decoupling allows Chrono
to support multiple formatting syntax,
such as YYYY-MM-DD
or Go-like 2006-01-02
instead.
Also, Chrono can have "hidden" formatting items
that can be used for internal purposes.
RFC 2822 and 3339 support is implemented in this way.
The formatting item is a good abstraction,
but every abstraction comes with a complexity.
In the case of Chrono
the complexity arises from the desire to avoid allocation.
The number of formatting items is
proportional to the length of format string
in the worst case,
so we cannot blindly collect items into a collection.
Instead, Chrono returns an Iterator
of formatting items
and directly consumes that iterator
for printing the date and time.
Therefore the following identity holds:
assert_eq!(StrftimeItems::new("%Y-%m-%d").collect::<Vec<_>>(),
[Item::Numeric(Numeric::Year, Pad::Zero),
Item::Literal("-"),
Item::Numeric(Numeric::Month, Pad::Zero),
Item::Literal("-"),
Item::Numeric(Numeric::Day, Pad::Zero)]);
I believe this redesign has a maximal flexibility
while retaining an ability to avoid std
at all.
Formatting with Items
Formatting with items is done with a new format_with_items
method.
The original format
method is now a thin wrapper over that.
This part of Chrono remains relatively unchanged, but there are some notable small changes which deserve the explanation:
- The original
format
had a bug with leap seconds: Instead of printing23:59:60.234567890
, it would print23:59:59.1234567890
. It originates from Chrono's unique handling of leap seconds, and has been fixed in Chrono 0.2. - There are
Numeric::Nanosecond
andFixed::Nanosecond
. The former corresponds tostrftime
's%f
specifier and was present since Chrono 0.1. The latter is a new formatting item which may print nothing,.###
,.######
or.#########
depending on the available accuracy. (An empty string is for the whole number of seconds.) This "adaptive" version turned out to be much more useful for parsing. Numeric::Timestamp
(%s
) is now supported. It was delayed due to the proper local date/time handling.- While this change is quite subtle,
Numeric::Year
(%Y
) andNumeric::IsoYear
(%G
) now has an explicit sign for years not between 1 BCE and 9999 CE. It is a necessity from bijective parsing, which means that once printed, any valid date and time can be parsed into the original value. This particular scheme is adopted from ISO 8601:2004, so this is not really Chrono's own invention.
Parsing with Items
Okay, this is a fun part. Basically Chrono 0.2 has the following parsing algorithm:
- A fixed string parses as is: It has to appear in the input string as is.
- A sequence of one or more whitespace consumes zero or more whitespace.
- Most numeric items (
Numeric::*
) have a predefined parsing width, the maximal number of digits that can be consumed. They consume zero or more whitespace followed by one or more but limited number of digits. The exception is made toNumeric::Year
andNumeric::IsoYear
, which may accept an arbitrary number of digits when preceded by a sign. - Fixed items (
Fixed::*
) have their own parsing logics, but normally does not consume preceding whitespace. They also normally ignore cases. - At the end of the formatting items, the whole input string should have been consumed.
This is modelled after strptime
's parsing algorithm,
which seems to handle lots of corner cases.
This allows a lax input like 2014-2-6
for a format string %Y-%m-%d
.
The fixed parsing width allows a format string like %Y%m%d
which would accept strings like 20140206
.
Parsing is daunting work,
implemented with two rather big modules (chrono::format::{parse,scan}
),
but it's actually an easy part.
Parsing only yields different date/time parts,
which has to be merged into actual values via the resolution algorithm.
As an easy example, consider RFC 2822.
RFC 2822 date and time format has a day of week part,
which should be consistent to other date parts when specified.
But strptime
-based parse would happily accept inconsistent input:
>>> import time
>>> time.strptime('Wed, 31 Dec 2014 04:26:40 +0000',
'%a, %d %b %Y %H:%M:%S +0000')
time.struct_time(tm_year=2014, tm_mon=12, tm_mday=31,
tm_hour=4, tm_min=26, tm_sec=40,
tm_wday=2, tm_yday=365, tm_isdst=-1)
>>> time.strptime('Thu, 31 Dec 2014 04:26:40 +0000',
'%a, %d %b %Y %H:%M:%S +0000')
time.struct_time(tm_year=2014, tm_mon=12, tm_mday=31,
tm_hour=4, tm_min=26, tm_sec=40,
tm_wday=3, tm_yday=365, tm_isdst=-1)
Resolving date/time parts is littered with lots of corner cases, and that's why common date/time parsers do not correctly implement it; as far as I know, glibc, Python and JodaTime completely ignores the resolution. Therefore I'm glad to announce that Chrono 0.2 has a complete and known-to-be-correct resolution algorithm.
Chrono has a dedicated date/time part storage called Parsed
.
The resolution algorithm is hard to describe as is,
but the relevant source code is relatively well-commented
and worth reading if you are interested in the algorithm.
Chrono 0.3?
Maybe this might be an early assumption, but Chrono 0.2 is intended to be a stable base of the future API evolution. The very main design decision was already in 0.1, but 0.2 completes and fixes the biggest problem with 0.1.
There are some issues I've thought about 0.3. Some of them will be definitely implemented in 0.3 (e.g. tzfile support), some others are somewhat illusive (e.g. additional format syntax). But these issues do not replace the users' feedback. Personally I'd like to thank /u/savage884, who brought the problem of local date handling in the /r/rust post. This post crucially helped me finishing the work and releasing 0.2. I wish others give a feedback for Chrono like that, so Chrono doesn't remain an library "considered annoying but designed so", which situation I really don't like.