Shelley Powers has a piece on RDF/XML syntax, making some concessions to the anti- brigade.
The biggest concern I see with RDF/XML from an XML perspective is its flexibility. One can use two different XML syntaxes and still arrive at the same RDF model, and this must just play havoc with the souls of XML folks.…
However, compatibility between the RDF/XML and XML versions of RSS is much thinner than my previous essay might lead one to believe. In fact, looking at RSS as a demonstration of the "XMLness" of RDF/XML causes you to miss the bigger picture, which is that RSS is basically a very simple, hierarchical syndication format that's quite natural for XML; its very nature tends to drive out the inherent XML behavior within RDF/XML, creating a great deal of compability between the two formats. Compatibility that can be busted in a blink of an eye.
However, as Shelley observes, RSS doesn't make a good case study of why the graph structure of RDF is needed over the XML tree in some applications.
RSS is basically a very simple, hierarchical syndication format that's quite natural for XML
There's simply no graph there, unless you are massaging the RDF into a rather more comprehensive data store a la Haystack.
channel
items
item
title
description
…
(Forgive any inaccuracies, I'm no RSS guru).
Simon St. Laurent responds:
The next sentence worries me a bit:"The very flexibility of the syntax must be anathema to XML purists."
It's not precisely flexibility that's the problem. …
I think for me the problem really comes down to the classic mismatch between trees and graphs. By design, XML is simply more limited than RDF - it's a directed acyclic graph, and children can't have more than one parent. Those limitations have made it a lot easier to write tools for XML, from SAX to XPath and on up. While those limitations are certainly jarring for free-form graph folks, they make it far easier to accomplish a lot of common tasks.
It seems strange to call XML a directed acyclic graph. It's not. It's a tree. Sure, a tree is a specialisation of a DAG, just as a DAG is a specialisation of a DCG. But to refer to a tree as a directed acyclic graph in which "children can't have more than one parent" desperately underplays the difference between the two.
This mismatch between trees and graphs is at the core of a lot of my soap box rants. We don't live in a tree-structure world, but our information management tools at best force us to map our world into a (single) tree structure. Hierarchical filesystems, hierarchical filesystem-metaphor folder spaces in email tools, and so on.
It is surely the case that its "limitations have made it a lot easier to write tools for XML". But those same limitations also limit what tools you can write with (as distinct from for) XML.
It's funny to me also that the one branch of the XML family of specs that's really supposed to handle graphs, XLink, hasn't caught on at all. (Nor has XPointer, which was supposed to let people point to parts of XML across element boundaries.) People using XML, for better or worse, seem to see XML itself as just trees, and handle the graphs in their programs, not in the syntactic representation.
Is it that funny? People find it more difficult to visualise graphs than trees, and the fine-grained and rather uncontrolled graphs of RDF are particularly hard to visualise in a useful way (more on this later). Trees, on the other hand, and particularly the nice small trees in digestible chunks which are XML documents in the main, are easy. They are easy to visualise, and the tools which work on them are easy to understand.
Tools which conduct graph operations over collections of tree-form documents have a handicap from the outset; their intended users not only find it easier to think in trees than in graphs, they are conditioned to do so by the structure of the documents they are working with and by all of the other tools they use to do so. Tools which layer graph structures over XML document collections bring a whole new layer, and demand a substantial modification in the mental model of their users.
As Koestler, among others, observes, the tools we use (including language, which was Koestler's particlar gripe) condition the way we think. They make some things easy, other things hard, and we tend not to get frustrated by the hard things, we just forget they exist. Similarly Davis, Schrobe, and Szolovits (1993) talk about a knowledge representation as "a strong pair of glasses"; they bring some things in to focus, at the cost of rendering others blurred.
RDF is hardly the only case where graphs collide with XML trees. Object structures, despite their use of hierarchies, are often graphs that look more like RDF than XML. Relational databases are built on top of graph foundations. Sure, you can put the tabular aspects of the data into XML, but XML doesn't fare nearly as well with the relationship aspects - all those graphs!
In the course of my PhD I developed a simple, ad hoc XML syntax for specifying (directed, possibly cyclic) the data flow graph structure of numerical models. This specification was then used to drive the assembly of modules implementing the nodes into larger models, which can in turn be further composed. Without knowing anything about RDF, I slowly came to understand that although the composition structure is hierarchical, fitting well in the XML world view, the connection structure is that of a graph.
I am now working on implementing a similar structure as an application of RDF (using Ora Lassila's Wilbur Lisp RDF framework, which also allows me to use the power of Lisp for the dynamic construction of programs). The reasons for this switch are many, but one important reason is that there seems little point in dealing with the same issues addressed by RDF again, and in the process passing up the opportunity to tap into available tools. Of course, the basic RDF ontology -- nodes and named arcs -- is seriously impoverished, and additional layers are needed, but those layers sit reasonably neatly on top of RDF, they don't argue with it.
People using XML, for better or worse, seem to see XML itself as just trees, and handle the graphs in their programs, not in the syntactic representation.
Keeping the graphs out of the syntax also makes life a lot easier for those of us who work with the syntax directly, which I really prefer when possible. Maybe direct contact with angle brackets is limiting, but it lets me get all kinds of work done without writing much code!
This might work if the graphs implied in the tree are trivial, or if the graph is wholly structured at the super-document scale -- by which I mean that only whole documents act as nodes, and there is no graph implied within a document -- but where a document must necessarily be a serialisation of a graph, pretending to be "handling the graph in the program" breaks down. As soon as you find node ids and cross references appearing in an XML syntax, the graph is leaking out of the program and into the syntax.
I appreciate the value in working directly with the syntax, and indeed tend to support the view that passing data round in XML form and automating translation in and out of RDF stores has some merit. Indeed, I don't anticipate suggesting to future users of the modelling software I am working on that they define their models by editing XML/RDF, although that option, and that of using N triples, should be available. I can provide them with a digestible syntax for defining models, optimised for that task, which has an unambiguous RDF representation, and I don't think I've lost anything. That syntax will be problem specific, but the underlying RDF ensures that generic tools can be used against the data too.
I guess this qualifies to an extent as handling the graph in the program; but it is explicitly handling the graph in the serialisation/deserialisation process. As a result, the relationship between serialised form and graph is clear.
The important issue is that a uniform and simple base metamodel exists, and the rest is built up from there in layers. Which layer you work in is then up to you.
I have a lot more thinking to do about this, and will no doubt keep coming back to it as I get more code working. The problem, as I see it, is that contrary to Alan Kay's assertion that
"Simple things should be simple. Complex things should be possible"
we have a choice between XML, in which simple things are simple, but complex things way too hard (or impossible), and RDF, which makes the complex possible, but doesn't seem up to scratch with the simple things. Again, perhaps the solution is to be found in experiments with mappings, possibly domain specific, between the two.
Comments