Digital information is different

This is beautifully done, and hits a few nails on the head:

Merlin has a few worthwhile comments on the strengths of paper. Also missing, from my point of view, is the fact that a lot of what we do with computers now (pretty much everything that's off the web, for a start) needlessly recreates the limitations of paper and adds a few new problems for good measure. Now where did I put that file ...

Tim Bray interview at ACM Queue

Another interesting interview over at ACM Queue: Tim Bray, mostly on XML and RDF.

The XML serialization of RDF is horrible;   it’s a botched job.

Tim Bray

Amen. Shame that.

[RDF] quickly became mixed up with a whole bunch of classic KR (knowledge representation) people who wanted to go refight the AI wars of the ’80s. … You know, KR didn’t suddenly become easy just because it’s got pointy brackets. Doug Lenat has been off working in the desert on that for decades and nobody has ever made a buck on it yet, as far as I know.

Tim Bray

There surely has been a buck made on KR, albeit not on Lenat's work (the Cyc project). Representing a significant amount of human knowledge formally is way too expensive to give a positive ROI, perhaps, but formal logic based languages have great promise in far more situations than they are currently used in. The current state of RDF is that it is too limited for it to be of much help there, and this is a pretty unsurprising result of its history. What was supposed to be a language for encoding metadata has been dragged into being a KR language, but it remains a seriously limited one. In the mean time it attracts a load of funding at the expense of other work.

Limitations of RDF as a KR language are limitations, too, as a metadata framework. After all, what use is a metadata framework intended for deployment on the Internet that can't represent provenance of data? This is being worked on, of course, but if RDF had been based on an existing KR language which supported multiple contexts, it wouldn't be an issue now.

Taxonomy, folksonomy, and value

Let's try again.

Shelley Powers has written a good summary of some arguments about the relative merits of tagging and more formal metadata. Nice pics, too.

Shelley is worried that the rise of "cheap" metadata, in particular tags, will inhibit development of tools to make "real" metadata available to the masses.

Theory: the value of terms in any metadata system used in an uncontrolled way tends to zero with time.

That's just as true for rich, formal so-called “ontologies” as it is for tag soup. There is a difference, however. Tags don't lie about their status. We know, right off, that they're just terms which people have attached to things, for some reason. We know that we don't know what sense the term is meant in, of the undoubtedly many possible. We know that we don't know the intention of the tagger in tagging. The spam problem arises from this: spammers are applying tags in order that pictures show up in certain places, not because they think those tags are relevant. The business of tagging flickr photos with the "offensive" tag is another case in point, where the intention is now to stop items showing up in certain places, and really has nothing to do with offensiveness.

Problematic, perhaps, but not unique to tagging. Exactly the same is true of uncontrolled use of any classification system, however good the tools are which provide access to it. Things will be tagged in a way which is inconsistent with the intended use of any system, either through ignorance or intent.

The rel="nofollow" tag which Google has announced it will use to control the inclusion of links in its PageRank calculations is indeed a nice demonstrator of an important point here. It's meaning is very well defined: it is grounded not in human understanding, but in machine behaviour. Nofollow can't be spammed: either Google honours it, or they don't. The web now has a mechanism for systematically witholding page rank. What it does with it is out of Google's control.

Point is, that some terms have meaning in this, mechanical way, like those which make up a programming language. Others have meaning in a fluid, human way. The latter will always be as fluid as they are human; no fancy tools will change that.

Fancy tools to make use of formal taxonomies easier would be great, of course. Very useful for those who care enough to put in the effort to understand, and make sure they stick to, the formal intended meanings of terms. But however easy that is made, it will always be hard, and people will always get it wrong. And as people get it wrong, the extensional definition of the term becomes less precise, and the value of the term tends to zero. You might slow the process down, but you can't, simply can't, stop it.

And, of course, because the effort involved in understanding and carefully applying the terms of a formal taxonomy is relatively high — independently of tools, which can reduce the accidental difficulties but cannot attack the essence — it is just not useful in the sort of "fire and forget" systems like del.icio.us, where any effort is too much.

In the comments to Shelley's post, Nick Sweeney writes:

I’m a bit of a Saussurean about this, in that I think that taxonomy (or ontology, depending upon your disciplinary point of origin) is crystallised/calcified folksonomy. Authorised folksonomy, if you like.

Right on! Those two terms, “crystallised” and “calcified” are a really useful duo. The one has connotations of order, beauty, and value. The other of dull rigidity. Well designed formal taxonomies have the first of those in abundance. Both have the second, and you can't avoid it. A taxonomy can only ever be an encoding of one incomplete and imperfect view of the world.

Shelley:

I agree with Clay that the semantic web is going to be built ‘by the people’, but it won’t be built on chaos. In other words, 100 monkeys typing long enough will NOT write Shakespeare; nor will a 100 million people randomly forming associations create the semantic web.

100 monkeys indeed will not write Shakespeare, but 100 Shakespeares didn't write Shakespeare either. Shakespeare did. The situation we have is that of 100 million people (or, at any rate, some large and growing number) who need tools to organise stuff, and to help them find stuff. 100 million people who will go right on randomly forming associations, whether the results pretend to be anything other than randomly formed associations or not.

We're dealing here not with the formal space of so-called ontologies and logic, but with the messy, human space of language and meaning. The processes involved are, essentially social. Tools which deny this will fail.

Shelley:

Clay believes that ultimately ontologies will fall to folkonomies, as the latter gain rapid acceptance because of their low cost and ease of use; I believe that ultimately interest in folksonomies will go the way of most memes, in that they’re fun to play with, but eventually we want something that won’t splinter, crack, and stumble the very first day it’s released.

When you design a building to ride out earthquakes with minimal damage, you build in flexibility. Of course, the clever part is in deciding where and how to build in that flexibility. Tagging provids an abundance of flexibility, so much that the building can barely stand up.

But in some ways it works. Delicious is already very useful, on an individual basis, and has some use at a social scale. I follow what other people post about Lisp, Scheme, and Smalltalk for example. These are well defined terms, referring to programming languages rather than highly ambiguous and mis- or differently interpretable concepts. With less well defined concepts, the problems lie as much in the nature of the concepts as in the tagging approach.

Simply aggregating tags a la Technorati as it stands doesn't show a great deal of promise, but it is possible to envisage tools which show tag clouds and, by presenting these to the user, encourage convergence around particular tags. The results will never be complete, never perfect, but always changing, and sometimes useful.

When you have a Web-load of people, things happen from the bottom up. That's the way the Web works.

Jailed for using a nonstandard browser?

How's this for two sides of the same story?

It matters because it has a URL

Blogs matter because they have a URL. They have a permanent web address. So often on the Internet, or in technology, but particularly on the Internet because of the social effects, a little tiny thing makes a huge difference. So this is a little thing, that weblogs have a permanent web address, because what that does for them is give them permanence, so we can keep going back to them and learn about the person.

Dave Weinberger

The importance of the humble URI has been coming up on and off recently. The quote above is from an after dinner speech Dave Weinberger did at the Blogging, Journalism & Credibility conference, which I had a listen to on my walk home (the walk home's too short — I had to listen to the end while I was making a cup of tea and some toast). In a discussion on the geo-reasoning mailing list, I commented regarding RDF that,

This could have been avoided if the TimBL and co had adopted a well establish knowledge representation language and "webified" it by using URIs for terms, rather than starting over.

To which Malcolm McClure replied,

I think that hypertext links, etc are very much a secondary issue, …

Aaargh. No! URIs — for all their faults — are very much the primary issue. And then there are all those broken web sites. The UK notional rail enquiries site has much more pressing problems to address (it's just broken) but if it worked at all it would still be a nuisance that you can't bookmark a train. Sure, it'd take some thought to work out how to encode a journey in a URI, but it'd be well worth the effort in the usability increase. How do we get this message over: if it doesn't have a URI, it's not on the web. If it matters, give it a URI.

Update (2005-02-01 (Oh heavens, it's February already!)): Make them nice URIs as well, of course. It's not hard. I found ISAPIrewrite, which has a free limited version, when we were sorting out the School's web site to fix this very problem recently. Apache does full regex-based rewriting out of the box, of course.

Amoral Adsense

Over at Bill de hÓra's weblog there's a post about a thesis on distributing RESTian web services. In itself worth linking to for future reference, but what caught my eye was the ads Google has placed at the top.

Custom Dissertations! it says. Custom Unpublished Dissertations in the UK - Quality Guaranteed.

UK University Assignments!

An impressive demonstration of Google's adsense ad placement algorithm, to be sure …

Artificial Intelligence? Pah.

Long ago Dad introduced me to a quote, which he believes to have been first uttered by a head of the AI group at the University of Edinburgh.

Artificial it may be, Intelligence it most certainly isn't. [Source unknown]

I have in the past found Google to be quite effective at locating sources of quotes for me. In this case it is of no assistance whatsoever. If anyone has any leads on this, I'd love to know. If not, you heard it here first.

Spread your wings and fly, little meme.

timeanddate.com

I've just come across timeanddate.com. An essential for anyone who has to arrange virtual meetings with people in different countries, like this.


Another David Harvey

Here's an interesting namesake:

I've written articles on Smalltalk, Prolog and Scheme, and reviewed books for the BCS OOPS newsletter and for SIGS European journal Object Expert. I've spoken on C++, and presented workshops on software architecture, architectural styles, patterns and development practice and culture at numerous conferences and events, including several OT conferences, JACC, Object Expo Europe, Unicom, Software Architecture 2000, and JSIG. [David Harvey]

And a good demonstration of why I call myself Hamish. This came up from GoogleAlert. I have alerts for "David Harvey" and for "Hamish Harvey" among others. David Harvey is always full of rubbish, and lots of references to the David Harvey who wrote "The Condition of Postmodernity". Hamish Harvey surprisingly often has items which refer to me (mostly written by me, of course).

Links are an essential element in interfaces

A GUI that doesn't embrace linking can never be truly rich. [Jon Udell]

Danny Ayers has worked up a CSS version (linked from a comment on Sam Ruby's weblog) of a flash fisheye menu demo which Jon pointed to in the above linked posting. In fact the Flash version does more than CSS, since as I understand it, a list of any length will display in the same space, and the fisheye effect makes the items viewable. Notice that the impact of the fisheye effect in the flash version extends to both ends of the list asymetrically, shrinking items far away from the cursor as it enlarges those close to; the CSS version just enlarges the area under the mouse.

I may be wrong, but it seems to me that the CSS provides a pretty effect, whereas the Flash version provides a genuinely different (though I'm not yet sure if it's useful) way to deal with long lists in short spaces.

Update (22/10/2003): I managed to store Danny Ayers' trackback address as a keyword rather than sending him a ping. It also occurs to me that to say that the CSS is just a pretty effect is too harsh; it does allow the list items to be shrunk considerably. I still think the Flash version is "better" in terms of the visual effect. It is, of course, "worse" because it is in Flash.

Update (22/10/2003): Jon Udell picked up on my comments without the need for trackback.

March 2009

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31