Digital information is different

This is beautifully done, and hits a few nails on the head:

Merlin has a few worthwhile comments on the strengths of paper. Also missing, from my point of view, is the fact that a lot of what we do with computers now (pretty much everything that's off the web, for a start) needlessly recreates the limitations of paper and adds a few new problems for good measure. Now where did I put that file ...

Drawn with a very fine camelhair brush

A commenter to a post of Dave Weinberger's left a link to an essay by Jorge Luis Borges, The analytical language of John Wilkins, which includes the quote which so amused Michel Foucault that it helped inspire his writing of The Order of Things (amazon.co.uk, amazon.com):

These ambiguities, redundancies and deficiencies remind us of those which doctor Franz Kuhn attributes to a certain Chinese encyclopaedia entitled 'Celestial Empire of benevolent Knowledge'. In its remote pages it is written that the animals are divided into: (a) belonging to the emperor, (b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies.

Jorge Luis Borges

The rest of the article makes it clear why the quote is significant: it looks ridiculous, so ridiculous that if it catches you at the right angle (it did me) it makes you laugh, hard and out loud, but it is ridiculous in <em>just the same way</em> as a lot of the classification schemes that we use all the time.

It is clear that there is no classification of the Universe not being arbitrary and full of conjectures

Jorge Luis Borges

Reticulation

I've not been paying attention. My earlier note in response to Sean McGrath's comments on hierarchy obviously hit the spot. I've only just noticed that Sean picked it up.

Jon Udell on del.icio.us

Gary King points out a screencast on the "social bookmarking" service del.icio.us by Jon Udell. Great work, as usual, from Jon; his earlier effort on the "Heavy Metal Umlaut" article at Wikipedia was perhaps even more powerful. Both draw out some of the power of Internet-scale social/community tools.

It's particularly amusing to see myself on the small screen: I'm one of the people who posted an article on domain specific languages. I posted it under dsl, because domainspecificlanguage(s) seemed a bit long. As Jon points out, dsl is a bit ambiguous. I could change it, but then I'd move my posts away from everyone elses on domain specific languages.

The problems with tagging remain clear. Supporting these social processes is vital, but the web will not be "semantic" until we, somehow, get beyond slapping simple sets of text strings against items. Ambiguity of acronyms is only the tiniest part of it. Still, they are indeed a powerful tool.

Tag stemming

Well, this won't solve the folksonomy argument, but it is a useful tool. Here's to the power of web service APIs.

Matt Biddulph's tag stemming tool for del.icio.us.

Continue reading "Tag stemming" »

Darwinian metadata

Ben Lund suggests that tags and folksonomies can be seen in a Darwinian light. A nice idea.

The four pre-requisites for a Darwinian process are Reproduction, Variation, Inheritance and Selection. Let's say our individuals are tags and our species is a folksonomy:

  • Reproduction: Certainly. New tags are created with almost every entry, be that bookmark, article, or photo.
  • Variation: Absolutely.
  • Inheritance: Probably. When I copy a bookmark, I often copy the tags too.
  • Selection: Yep. 'Edit tag' was the first feature requested after we launched Connotea.

I think there's some mileage in this idea, but I'm not sure that individual=tag, species=folksonomy works perfectly. Then again, it's an analogy, let's not get too worked up about it. More importantly, "edit tag" isn't enough on the selection front. Tags need to die on a large scale, and that isn't happening. Nor is it obvious how it can happen without things that were tagged no longer being tagged, or old tags being ignored, which seems to lose too much value.

Taxonomy, folksonomy, and value

Let's try again.

Shelley Powers has written a good summary of some arguments about the relative merits of tagging and more formal metadata. Nice pics, too.

Shelley is worried that the rise of "cheap" metadata, in particular tags, will inhibit development of tools to make "real" metadata available to the masses.

Theory: the value of terms in any metadata system used in an uncontrolled way tends to zero with time.

That's just as true for rich, formal so-called “ontologies” as it is for tag soup. There is a difference, however. Tags don't lie about their status. We know, right off, that they're just terms which people have attached to things, for some reason. We know that we don't know what sense the term is meant in, of the undoubtedly many possible. We know that we don't know the intention of the tagger in tagging. The spam problem arises from this: spammers are applying tags in order that pictures show up in certain places, not because they think those tags are relevant. The business of tagging flickr photos with the "offensive" tag is another case in point, where the intention is now to stop items showing up in certain places, and really has nothing to do with offensiveness.

Problematic, perhaps, but not unique to tagging. Exactly the same is true of uncontrolled use of any classification system, however good the tools are which provide access to it. Things will be tagged in a way which is inconsistent with the intended use of any system, either through ignorance or intent.

The rel="nofollow" tag which Google has announced it will use to control the inclusion of links in its PageRank calculations is indeed a nice demonstrator of an important point here. It's meaning is very well defined: it is grounded not in human understanding, but in machine behaviour. Nofollow can't be spammed: either Google honours it, or they don't. The web now has a mechanism for systematically witholding page rank. What it does with it is out of Google's control.

Point is, that some terms have meaning in this, mechanical way, like those which make up a programming language. Others have meaning in a fluid, human way. The latter will always be as fluid as they are human; no fancy tools will change that.

Fancy tools to make use of formal taxonomies easier would be great, of course. Very useful for those who care enough to put in the effort to understand, and make sure they stick to, the formal intended meanings of terms. But however easy that is made, it will always be hard, and people will always get it wrong. And as people get it wrong, the extensional definition of the term becomes less precise, and the value of the term tends to zero. You might slow the process down, but you can't, simply can't, stop it.

And, of course, because the effort involved in understanding and carefully applying the terms of a formal taxonomy is relatively high — independently of tools, which can reduce the accidental difficulties but cannot attack the essence — it is just not useful in the sort of "fire and forget" systems like del.icio.us, where any effort is too much.

In the comments to Shelley's post, Nick Sweeney writes:

I’m a bit of a Saussurean about this, in that I think that taxonomy (or ontology, depending upon your disciplinary point of origin) is crystallised/calcified folksonomy. Authorised folksonomy, if you like.

Right on! Those two terms, “crystallised” and “calcified” are a really useful duo. The one has connotations of order, beauty, and value. The other of dull rigidity. Well designed formal taxonomies have the first of those in abundance. Both have the second, and you can't avoid it. A taxonomy can only ever be an encoding of one incomplete and imperfect view of the world.

Shelley:

I agree with Clay that the semantic web is going to be built ‘by the people’, but it won’t be built on chaos. In other words, 100 monkeys typing long enough will NOT write Shakespeare; nor will a 100 million people randomly forming associations create the semantic web.

100 monkeys indeed will not write Shakespeare, but 100 Shakespeares didn't write Shakespeare either. Shakespeare did. The situation we have is that of 100 million people (or, at any rate, some large and growing number) who need tools to organise stuff, and to help them find stuff. 100 million people who will go right on randomly forming associations, whether the results pretend to be anything other than randomly formed associations or not.

We're dealing here not with the formal space of so-called ontologies and logic, but with the messy, human space of language and meaning. The processes involved are, essentially social. Tools which deny this will fail.

Shelley:

Clay believes that ultimately ontologies will fall to folkonomies, as the latter gain rapid acceptance because of their low cost and ease of use; I believe that ultimately interest in folksonomies will go the way of most memes, in that they’re fun to play with, but eventually we want something that won’t splinter, crack, and stumble the very first day it’s released.

When you design a building to ride out earthquakes with minimal damage, you build in flexibility. Of course, the clever part is in deciding where and how to build in that flexibility. Tagging provids an abundance of flexibility, so much that the building can barely stand up.

But in some ways it works. Delicious is already very useful, on an individual basis, and has some use at a social scale. I follow what other people post about Lisp, Scheme, and Smalltalk for example. These are well defined terms, referring to programming languages rather than highly ambiguous and mis- or differently interpretable concepts. With less well defined concepts, the problems lie as much in the nature of the concepts as in the tagging approach.

Simply aggregating tags a la Technorati as it stands doesn't show a great deal of promise, but it is possible to envisage tools which show tag clouds and, by presenting these to the user, encourage convergence around particular tags. The results will never be complete, never perfect, but always changing, and sometimes useful.

When you have a Web-load of people, things happen from the bottom up. That's the way the Web works.

Losing time

Damn. Firefox ate my homework. I'd just spent some (too much) time commenting on the debate about taxonomies versus folksonomies, particularly as summarised by Shelley Powers, when it crashed, taking my efforts with it.

It crashes quite a lot, really, often after freezing up for a while. If I switch to a different application then switch back, particularly to a page with text fields in a form, it often takes a while to update and come back to life, and sometimes dies before it does so.

I guess I'd better just get used to using a desktop client rather than the TypePad web form to post. Another app to fiddle with proxy settings for depending on whether I'm at home or at work. Rats.

On Hierarchy

A nice line:

Hierarchies. Cannot think within them. Cannot think without them.

Sean McGrath

Arthur Koestler in "The Ghost in the Machine" (amazon.com, amazon.co.uk) talks about arborisation and reticulation as complementary. Hierarchical structuring is powerful, but a closed hierarchy is restrictive. The leaves of the hierarchy (at least) must connect with those of other hierarchies to form powerful structures.

I think Koestler's insight provides quite a conceptual leg-up when thinking about the composition of system representations ("models") with which to drive simulations. Considered as gross components, subsystem representations are assembled in a composition hierarchy which is a straightforward tree. The leaves of this tree are atomic components which are defined in some way other than by composition. Composites do something useful because they define connections between the inputs and outputs of their component parts, and ultimately this alsways means connecting the inputs and outputs of the atomic leaf components. Thus the modeller is simultaneously defining a tree and a network.

When a single-purpose model is being constructed, this will result in a single tree. We can imagine, however, that long-running systems expose the data flows at their leaves in some way (publish and subscribe). In this case it should be possible to define new systems which connect to these flows, and then we start to define a forest of individual trees.

Folksonomies

Says Louis Rosenfeld,

Lately, you can't surf information architecture blogs for five minutes without stumbling on a discussion of folksonomies (there; it happened again!).

There; it happened again. I'm not stopping long enough to comment, just to say that it's good to see this getting some serious thought and debate. It's clear that carefully crafted taxonomies don't work for on-the-fly information organisation; we need something better, something more dynamic.

I think is largely triggered by the success of del.icio.us (get an account if you haven't already: it's a powerful tool). There are now two similar tools specifically intended for academics, CiteULike and connotea. I'm not done evaluation which of these, if either, has the edge.

March 2009

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31