Notes Information Apocalypse

Resource Dissonance Format

Recently, I've been looking for good case studies of the practical benefits of RDF adoption, and while I've found a little interesting stuff, I've been noticing a severe dissonance in the volume of useful information out there, compared to the volume of hype, bombast, needless jargon, and misplaced criticism.

One significant aspect of this RDF dissonance is that many people have a tendency not to see beyond its expression in XML. I'm starting to realise how important it is to think of it as a graph format, first and foremost, and not to get too caught up on general syntactic woes. Unfortunately for application/rdf+xml pushers, the problems with namespaces cross cut this syntactic-semantic divide, but there are always alternatives.

The main (interrelated) issues with RDF that I see are much more tricky to resolve than mere quirks of XML adoption:

  1. The statement model doesn't map to the intuitive mental models of publishers, authors, and designers who are developing web content or information architectures. It doesn't account for the importance of visible metadata, and thus de-emphasises the way that most people think about hypertext content.
  2. The extreme significance of URI's in RDF leads to a level of semantic indirection that is hard to visualize coherently in a single graph. It simultaneously emphasises singular interlocking descriptions, with decentralized and distributed definitions, breaking a significant principle of metadata value.
  3. Facts are fluid, history is debatable, people lie, or misconstrue the truth, meaning is never fixed, and natural language ontologies are filled with contradictions and inconsistencies. Fighting this with logic is surely going to be futile, which places a heavy burden on interpretation, even though this seems to be what RDF set out to relieve in the first place.

The semantic web community tends to emphasise the importance of well defined meaning first and foremost, which can be somewhat at odds with the large scale behaviour of web users. Tagging is probably so popular because it is a syntactic rather than semantic way of describing things, thus places no ontological burden on users or developers. This surely leads to accumulating mountains of ill-defined mess, but at the same time, it broadens the reach of participation on the web, and extends the concept of publishing and architecture to encompass organic and reflexive editing and community growth.

A crossover between the abstract semantic model of RDF and the concrete visible semantics of HTML is emerging with the work being done on GRDDL. The clunkiness of the acronym is probably the only thing holding this approach back - it directly addresses the major problem of non-visible metatata mentioned above, placing the resource description as an extraction from a well understood HTML document source.

From my limited understanding, the power of RDF to disambiguate is not always clear in many situations, and is obviously more suited to massively distributed application contexts where the value of organic approaches can be limited by some of the inherent instabilities in social software. These contexts are perhaps where resource descriptions could be of some benefit without needing to throw in heavyweight natural language processing and machine learning.

I still haven't fully made up my mind on much of this. I can see some significant forces preventing the wider adoption of RDF models by publishers, but I can also see that the graph model itself is so well optimized for the bigger picture of the web. It would be great to find a unifying vision of RDF and HTML technologies that doesn't just blindy evangelize the semantic web and ignore the messy realities of natural language. At this stage however, I'm just not sure where to look.