Rustlog : Worklog 2015-01-17

In the case you don't know, there are a number of Korean people interested in and working with Rust. Consequently, the emergence of Korean Rust User Group should be evident. It organizes a soon-to-be-monthly series of loose meetups, where people concentrate on getting things done no matter they are working on (libraries, Rust PRs and so on).

Today we had the fourth meetup (the previous one was on 2014-12-20), and as an advertisement (haha) I'm going to post a short summary of things that actually have been done in that meetup soon. For this worklog, I'm going to discuss my newest PR, #21304.

Rustdoc Testing

It is relatively well known that Rustdoc is one of the most under-tested components in the entire tree. That is partly because we are dealing with the HTML output (the write-only language^TM since circa 2004) which is often hard to test without manual intervention. Having fixed several Rustdoc issues in the past, I naturally wanted to get the regression test for Rustdoc but was surprised that there are only handful number of actual tests.

#21304 is my attempt to resolve this issue. It adds a moderately-sized Python script for matching against given portion of the HTML file. Hopefully (note the caveats listed below), it will make writing Rustdoc tests easier since the code and verification data are integrated. For example, the run-make/rustdoc-where test uses this verification script to check if the where clause is present in the output:

grep "Alpha.*where.*A:.*MyTrait" $DOC/struct.Alpha.html > /dev/null
echo "Alpha"
grep "Bravo.*where.*B:.*MyTrait" $DOC/trait.Bravo.html > /dev/null
echo "Bravo"
grep "charlie.*where.*C:.*MyTrait" $DOC/fn.charlie.html > /dev/null
echo "Charlie"
grep "impl.*Delta.*where.*D:.*MyTrait" $DOC/struct.Delta.html > /dev/null
echo "Delta"
# and so on

This chunk of code is separate from the actual source code being compiled. With this PR, the following Rust code can also be used to verify its Rustdoc output:

// @matches foo/struct.Alpha.html '//pre' "Alpha.*where.*A:.*MyTrait"
pub struct Alpha<A> where A: MyTrait;

// @matches foo/trait.Bravo.html '//pre' "Bravo.*where.*B:.*MyTrait"
pub trait Bravo<B> where B: MyTrait {}

// @matches foo/fn.charlie.html '//pre' "charlie.*where.*C:.*MyTrait"
pub fn charlie<C>() where C: MyTrait {}

pub struct Delta<D>;
// @matches foo/struct.Delta.html '//*[@class="impl"]//code' "impl.*Delta.*where.*D:.*MyTrait"
impl<D> Delta<D> where D: MyTrait {
    pub fn delta() {}
}

// and so on

This actually constrains the portion of HTML to look at, so we can avoid some kinds of false positives. The cons is XPath, but that's the best we can achieve without introducing any non-standard Python dependencies. (Yes, Python has a built-in XPath implementation, slightly flawed.) Any suggestion in this area is welcomed.

I have some plans to do after this PR gets accepted. The first is, of course, turning Rustdoc tests into a new kind of tests. This will make writing Rustdoc tests more easier (add one file and done) so the Rustdoc fixes would get proper regression tests. I expect that we will need tons of A-needstest after this change.

The second is a built-in link checker and other analyses. I already worked on #15309, and summarized the status of internal links in the generated docs, but frankly this should be done automatically. XMPPwocky has demonstrated the usefulness of link checker, but his checker uses html5ever which is too large to integrate into the main tree. If we can get a reasonable performance with a link checker in Python, it would be great. (The link checker based on html5ever might be useful for bigger testing infrastructure, though.)

Oh, and as aside I should really resume working on #19606...