Google Web Authoring Survey

Google have released a bunch of high level statistics from a recent survey, looking at the relative frequency of HTML elements across more than a billion web documents. Of particular relevance to microformatters and others interested in semantic html, is the data on classes and hyperlinks.

The most common class attributes are a strange mix - not as many presentational conventions as I might have expected (just "smalltext", "style1" and "white"), but the prevalence of class="title" and class="content" indicates potential ambiguities for the deliberate semantics of Microformats like hCard, hAtom, and hReview.

Looking at the hyperlink data, it's interesting to see that usage of rel="nofollow" drastically outnumbers any other link relationship, which could have something to do with the volume of blog comments implementing this controversial feature. rel="license" also features prominently, perhaps because of it's obvious utility. Other relationships appear more patchy - but maintain enough of a prescence to have significance.

I don't fully agree with their comment about the ridiculously vague definition of HTML link types. If anything, I think this is more indicative of a shift in vocabulary over recent years. The HTML specification was written in the late 90's, when most weblog-like sites were still coded by hand and the concept of a permalink didn't exist. In this context, the intentionality expressed by "bookmark" is quite clear, which is why this attribute is being made a part of the hAtom format.