Worklog 2015-01-10


Okay, I was so frantic about the language updates and went to (mostly) the library maintenance mode. If you saw encoding or chrono from a lot, that's because I tended to update per each breaking nightly.

Anyways, we've finally got 1.0.0 alpha and I can finally refrain from simply keeping libraries up. I'm now working on Chrono 0.2, which will have a new Offset design.

Grisu and rust-strconv

By the way, what happened to rust-strconv? In the previous (i.e. 3 week old) post I've said the initial version of Grisu can be released within a week. That was a serious misprediction and it actually took two full weeks. (What is Grisu, by the way? Bryan O'Sullivan has a good introduction.)

Grisu by itself is a simple algorithm to understand. It first scales the input into a certain convenient range (while keeping the exponent, obviously), so that the range can be easily fit to u64. It then maintains two approximate ranges for the actual range in which values would round to that number; one overestimates, another underestimates. Quoting the diagram in the comments:

        +- actual range of minus
  | <---|---------------------- unsafe region --------------------------> |
  |     |                                                                 |
  |  |<--->|  | <--------------- safe region ---------------> |           |
  |  |     |  |                                               |           |
  |1 ulp|1 ulp|                 |1 ulp|1 ulp|                 |1 ulp|1 ulp|
  |<--->|<--->|                 |<--->|<--->|                 |<--->|<--->|
  |   minus   |                 |     v     |                 |   plus    |
minus1     minus0           v - 1 ulp   v + 1 ulp           plus0       plus1

Both the scaled value and ranges are repeatedly narrowed down until we are out of significant digits. (There is one important inequality proved in the paper which can be used to quickly test this.) At this stage we have one possible shortest representation, so we try other representations to find the shortest representation which is also closest to the actual value. Finally, we can verify that it actually is in the underestimated range so we can safely return it. Each step has its own rounding error, but it is well characterized so we account for that.

The problem with implementing Grisu is not the complexity; it's rather an analysis, which is very unwieldy to understand. For example, the implementation shouldn't overflow, and Grisu code actively avoids that, but both the paper and the code is shallow in details. The code has an extensive documentation, but not a proof, and it is very hard to be sure on edge cases. At the end, it took almost a month to fully understand and write every required constraint in implementations down. Frankly speaking, my comments wouldn't really help anyway if you are to write your own reimplementation of Grisu, but at least it should guide potential reviewers.

Fortunately, my attempt has paid off. Combined with Dragon it swiftly overshadows the original naive implementation both in the performance (up to 1000x faster) and the accuracy. I've exhaustively verified that Grisu for f32 returns the same result as Dragon when it does give a result. Just out of curiosity, I've also counted how many f32 does return None; there are precisely 17,763,160 such values out of 2,139,095,039 positive finite values. I have some idea for improving on severe cases1 but that would be another story.

There are two more possible modes of floating point printing. Grisu-like algorithm can be used to quickly try to print a predefined number of significant digits or up to certain number of digits after a decimal point if possible. I have a working code to do the former (with an informal proof in progress), but it seems not to fit well to the common interface shared by Dragon. The common interface can notify that the following digits are zeroes, but it is not very obvious that Grisu can report None in that case. (Grisu cannot really give the exact answer in this case.) I'm currently studying the C++ implementation for possible resolution.

  1. In the worst case, 50% of 2^23 f32 values between 2^21 and 2^22 return None (it's 0.8% in the full range, for comparison). This is mostly due to Grisu's inability to handle the exact halfway cases very common to that range. Note that this range is relatively close to 1 (than many others), so it can impact the real world performance of Grisu.