On Semantic HTML

Having slogged through the trenches on some pretty large scale HTML and CSS builds, I tend to have a somewhat jaundiced view of declarative syntax. There’s only so many hundreds of hours you can spend squeezing bizzare layout bugs out of stylesheets before you start to go stark raving mad.

But maybe there’s light at the end of the tunnel. Many fellow designers and web developers would agree just how incredibly liberating and rewarding it was to finally kick Netscape 4 out the back door. And back in 2003, after many fingernail grinding days of prototyping and browser testing, the moment of getting the Te Ara entry pages to display near perfectly in IE4/PC was a high that I’ll always remember. This validated the assumption that CSS had always been a capable technology, that the problems faced by designers were less to do with poor standards support, and more to do with flat out cognitive dissonance.

Slowly but surely, over the past 4-5 years of focusing on pure CSS design, I’ve been internalizing a catalog of patterns and layout structures that I use time and time again. It’s brilliant to be able to look at a site architecture on paper, or a visual mockup on screen, and translate it into a logical set of tag components that encapsulate the essence of this structure, without the arbitrary cruft and presentational hackyness that defined the HTML web where I grew up.

So an ineviatable question that arises from reflection on this experience developing patterns of stock CSS and HTML connectives, is how should they be documented and communicated? Or if they should be at all?

The concept of microformats rides on the idea that HTML can express equivalent semantics to that of a specifically defined data format. There are already some fairly impressive examples of what is possible with the modularization of XHTML, but the microformat way emphasises the ubiquity of basic HTML in its most simple form, a huge area of potential that is beginning to be tapped.

I’m not interested in arguments about whether namespaces are misplaced cases of chicken little (my problem is that I see both points, and find it hard to disagree with either). And I can hardly further the claims that RDF is an over-intellectualized technology with no practical benefits when I’ve seen its immense possibilities in action. But the point is that first and foremost, the web is about the visual display of information. And if that visual information carries relevant semantics as a matter of course, it could be a win-win situation for everyone on both sides of the screen/server divide.

One of the nice things about microformats is the fixed structures they define, which are really coherent to parse (if you don’t let optional class selectors or nesting get too out of control). Having spent a bit of time experimenting with an RTF parser in PHP, even the messiest HTML seems ultra-structured to me now. I don’t develop glyph based document editors, so that’s probably why. It just seems natural that HTML, which is in essence, a meta vocabulary for plain old writing, can be induced to carry more expressive semantics than just pure structural markup.

One thing that does continues to irk me about microformats though, is that they seem to lock down the syntax of certain concepts that might have other equally valid expressions in HTML. I guess we all have to agree on something. Its interesting that the most promiment and successful integrations of semantic technology into HTML so far have been based on structured linking formats – take RSS, Atom, and relTag as the best examples of leveraging this extensibility. But beyond anchors and links, a distinctly different situation arises. It’s content bitch: much less meta, much more relationship to writing and visual design.

So my current strategy is just to focus on the currently accepted microformat definitions for public/collaborative data, but otherwise, try to ignore the specificities of extracting microformats per-se, and just leverage HTML semantics in general, more and more throught all aspects of my design and publishing work. I’ve been hand-writing HTML for 7 years or so, it is a relatively natural and expressive language to me, and I’d like to see it reach it’s full potential. I appreciate that not everyone sees it this way, but I think there are a lot of people who do. Making more structured information available for backend consumers is only half of the picture. The real challenge for designers in this emerging infoscape, is to make it easier to share and communicate meaningful ideas beyond the angle brackets.