Taxonomy, folksonomy, and value

Let's try again.

Shelley Powers has written a good summary of some arguments about the relative merits of tagging and more formal metadata. Nice pics, too.

Shelley is worried that the rise of "cheap" metadata, in particular tags, will inhibit development of tools to make "real" metadata available to the masses.

Theory: the value of terms in any metadata system used in an uncontrolled way tends to zero with time.

That's just as true for rich, formal so-called “ontologies” as it is for tag soup. There is a difference, however. Tags don't lie about their status. We know, right off, that they're just terms which people have attached to things, for some reason. We know that we don't know what sense the term is meant in, of the undoubtedly many possible. We know that we don't know the intention of the tagger in tagging. The spam problem arises from this: spammers are applying tags in order that pictures show up in certain places, not because they think those tags are relevant. The business of tagging flickr photos with the "offensive" tag is another case in point, where the intention is now to stop items showing up in certain places, and really has nothing to do with offensiveness.

Problematic, perhaps, but not unique to tagging. Exactly the same is true of uncontrolled use of any classification system, however good the tools are which provide access to it. Things will be tagged in a way which is inconsistent with the intended use of any system, either through ignorance or intent.

The rel="nofollow" tag which Google has announced it will use to control the inclusion of links in its PageRank calculations is indeed a nice demonstrator of an important point here. It's meaning is very well defined: it is grounded not in human understanding, but in machine behaviour. Nofollow can't be spammed: either Google honours it, or they don't. The web now has a mechanism for systematically witholding page rank. What it does with it is out of Google's control.

Point is, that some terms have meaning in this, mechanical way, like those which make up a programming language. Others have meaning in a fluid, human way. The latter will always be as fluid as they are human; no fancy tools will change that.

Fancy tools to make use of formal taxonomies easier would be great, of course. Very useful for those who care enough to put in the effort to understand, and make sure they stick to, the formal intended meanings of terms. But however easy that is made, it will always be hard, and people will always get it wrong. And as people get it wrong, the extensional definition of the term becomes less precise, and the value of the term tends to zero. You might slow the process down, but you can't, simply can't, stop it.

And, of course, because the effort involved in understanding and carefully applying the terms of a formal taxonomy is relatively high — independently of tools, which can reduce the accidental difficulties but cannot attack the essence — it is just not useful in the sort of "fire and forget" systems like del.icio.us, where any effort is too much.

In the comments to Shelley's post, Nick Sweeney writes:

I’m a bit of a Saussurean about this, in that I think that taxonomy (or ontology, depending upon your disciplinary point of origin) is crystallised/calcified folksonomy. Authorised folksonomy, if you like.

Right on! Those two terms, “crystallised” and “calcified” are a really useful duo. The one has connotations of order, beauty, and value. The other of dull rigidity. Well designed formal taxonomies have the first of those in abundance. Both have the second, and you can't avoid it. A taxonomy can only ever be an encoding of one incomplete and imperfect view of the world.

Shelley:

I agree with Clay that the semantic web is going to be built ‘by the people’, but it won’t be built on chaos. In other words, 100 monkeys typing long enough will NOT write Shakespeare; nor will a 100 million people randomly forming associations create the semantic web.

100 monkeys indeed will not write Shakespeare, but 100 Shakespeares didn't write Shakespeare either. Shakespeare did. The situation we have is that of 100 million people (or, at any rate, some large and growing number) who need tools to organise stuff, and to help them find stuff. 100 million people who will go right on randomly forming associations, whether the results pretend to be anything other than randomly formed associations or not.

We're dealing here not with the formal space of so-called ontologies and logic, but with the messy, human space of language and meaning. The processes involved are, essentially social. Tools which deny this will fail.

Shelley:

Clay believes that ultimately ontologies will fall to folkonomies, as the latter gain rapid acceptance because of their low cost and ease of use; I believe that ultimately interest in folksonomies will go the way of most memes, in that they’re fun to play with, but eventually we want something that won’t splinter, crack, and stumble the very first day it’s released.

When you design a building to ride out earthquakes with minimal damage, you build in flexibility. Of course, the clever part is in deciding where and how to build in that flexibility. Tagging provids an abundance of flexibility, so much that the building can barely stand up.

But in some ways it works. Delicious is already very useful, on an individual basis, and has some use at a social scale. I follow what other people post about Lisp, Scheme, and Smalltalk for example. These are well defined terms, referring to programming languages rather than highly ambiguous and mis- or differently interpretable concepts. With less well defined concepts, the problems lie as much in the nature of the concepts as in the tagging approach.

Simply aggregating tags a la Technorati as it stands doesn't show a great deal of promise, but it is possible to envisage tools which show tag clouds and, by presenting these to the user, encourage convergence around particular tags. The results will never be complete, never perfect, but always changing, and sometimes useful.

When you have a Web-load of people, things happen from the bottom up. That's the way the Web works.

So farewell then, Blunkett

So after all this silence, I can't let this momentous event pass without comment. Sadly, Blunkett has been pushed — and what should have been a time of debate has been totally suppressed by — an utterly insignificant (ab)use of power, while his programme of turning the UK even further into a police state is not to be modified. Still, perhaps Clarke won't make public plans to undermine one of the fundamental safeguards of the justice system while well out of earshot abroad. Maybe. We'll see. Meanwhile, YudelLine independently invented the name that Pete coined last year. There must be a Platonic Form out there.

C'mon Clarke. If the mechanisms are implemented perfectly, ID cards can't do what is offhandedly claimed for them. It's a lie. And the mechanisms, we all know perfectly well, won't be implemented perfectly. It's based, at bottom, on the idea of a perfect database: a repository of manifestations of — again — Platonic Forms of identity. Get real. Drop it.

All we can hope now is that they employ EDS to implement the thing (here's why, if you don't know already).

MP web logs

It seems that a few MPs have web logs. I came across Clive Soley's (RSS) a few weeks ago (and couldn't resist diving in to the tuition fees debate despite not living in Sheperd's Bush). I've just followed links from there to Tom Watson's (MP for the confusingly named West Bromwich East, apparently the first blogging MP RSS) and Richard Allen (MP for Sheffield Hallam, with a note on the front page at the moment about patent madness - good to know there's someone on the inside with some understanding RSS).

Interesting stuff. Jean Corston, as far as I know, doesn't have one.

Comment spam aside, a weblog seems to me an excellent way of presenting your views to and collecting input from (well educated, well heeled, tech savvy ...) constituents. That parenthetical comment might be alleviated a little, too, by efforts such as those in Easton of Bristol Wireless. Jean Corston is pretty good at responding to comments, although the (presumably procedural) need to reply on paper slows things down rather and doesn't encourage conversation. It seems to me that a constituent might reasonably ask to be kept abreast of their MP's views on matters of current debate, in order that they can enter into some meaningful debate with that person. A set of undifferentiated votes, in any case at least partly based on national mood and the popularity of the candidate's party, every 4-5 years is certainly not enough information for an MP to base an assessment of their constituents views.

Perhaps I should write and suggest blogging …

Citation and influence: science versus the blogosphere

Jon Udell observes that communication in science is still predominantly off-line or private (access restricted journals, and email).

Beyond the computer-science-related disciplines, though, it's unclear to me how much scientific content is becoming freely available online, and therefore able to benefit from the powerful knowledge-transmission and reputation-building forces at work in the blogosphere.

One of the problems here is that academics who are not working in computer science (and no doubt some who are) are not aware of the tools, and haven't thought about what better tools could do. And I am finding, as I periodically try to do something about this, that there are very good reasons for this situation. There is still way too much expertise needed to get things set up and working smoothly enough that you can go to a busy academic and present them with a compelling proposition for investing time in learning the tools.

In the case of weblogs, and some recent introductions to the concepts and technology notwithstanding, there's still a huge barrier to entry in just explaining enough of the concepts for anyone other than a relentless self-publicist to see the value of learning.

When blogs get really popular

While there are a hell of a lot of blogs and blog readers, blogs aren't even close to being a mainstream phenomenon the way email is. It'll happen. And here are some guesses (note: guesses) about what they'll look like when they do. [David Weinberger]

Some interesting points here. 2 and 3 (the rise of group blogs, and blogs as discussion) are why I keep meaning to get round to setting up a weblog for friends to post to. The sort of stuff that goes round to a pretty arbitrarily selected bunch of people ("look at this site", "here are some pictures of event X") could just as easily go on a group weblog in a more inclusive, and searcheable, way.

Another "I'm getting round to it", but this time I'm getting round to it today, just as soon as I stop writing this, is setting up a weblog (or set of weblogs — not sure yet) for a research project I'm involved with. This project is using remote sensing data in river modelling, and exploring how much can be done with recently available data and a minimum of site visits. I'm not involved in that side of things, but we need to have a record of the problems encountered by those doing the modelling work as they go along. I want to get them to jot down a quick paragraph every time they encounter a situation which would have been easier to resolve if some more metadata was available. This merges group blogs, blogs as discussions, time-limited blogs (point 6, limited to the duration of the project). It might or might not be closed circulation; I'd prefer open, but since it's not me who will be blogging, we'll have to see.

Andrew Orlowski in San Francisco

The cheek of this man is incredible. First he makes snide remarks about webloggers (as mentioned previously).

Even with such fine wits as O'Brien and Foster in attendance, it's hard to imagine that this won't break down into a consensual circle-jerk. Such is the world of blogdom, which relies heavily on its own, synthetic consensus.

Then he uses them for cheap column inches.

Orlowski on BloggerCon

Andrew Orlowski looks to be trying to stir up protest from bloggers-about-blogging again. Orlowski, of course, is doing well from trawling "blogspace" and rewriting other people's objections.

The main point is sound: I can't imagine why anyone would shell out $500 to spend a weekend talking about blogging.

The comparison of weblog tools with DTP tools isn't really a good one. The quantitative difference in ease of publication using DTP software, a printer, and stamps, to that of publishing using weblog software, is so large as to become a qualitative difference. And DTP software demanded that its user be competent in design to produce the smallest newsletter: blocks of text must be fitted together round the page, and so on. The result of this is that the uninitiated made use of the 300 fonts they had available, and produced abominations. With weblogs, the format leads towards uniformity, not away from it: each story is styled the same way, and posts are displayed in a single column.

Well, unlike the HTML coders who populate the blogging-about-blogging part of blogdom, these excellent, content-first webloggers spend little time congratulating themselves on their choice of medium. None of them use sticky weblog distractions (sorry, 'innovations!') such as Trackbacks, which cause so much grief for Google users. In fact, they spend most of their time writing well, rather than congratulating themselves for being "bloggers".

All well and good. If you aren't interested in blogging software, don't read the blogs written by the people making it. The beauty of this stuff is that you can ignore it if you're not interested. As for trackbacks, I think pretty much everyone would acknowledge that they are far from perfect. But to blame the people who use them for skewing Google's results is downright absurd. The algorithm doesn't work, so the raw material must be broken.

I bet you there are magazines about newspaper printing, and I bet you they talk about ink:

The medium is not the message. Imagine how tedious newspapers would be if every other story proclaimed "We use INK!!!" The writers don't care, and the readers don't care, how this message was delivered: but readers do care about quality.

The "it's only a tool, it's what you do with it that matters" is all well and good, but it fails to notice that tools have an enormous impact on their users. So tools are important. And talking about tools is important. And it will always remain so.

I remain unsure what BloggerCon was supposed to be about, and for. It isn't a gathering of techies, the guys developing the software which enables blogging. So you might see it as an attempt by the techies to get the users talking about how they use the tools. That has the potential to be useful, but the problem is that the users are unlikely to want to pay for the privilege. And because most weblog software is free -- either as in speech or as in beer -- the techies can't pay either.

Give up, I say. Blogspace plus the odd wiki seem an effective way of collaborating.

Blogging from the pub

Ah, technology. Post to the web from my mobile, and check the result from the Bristol Wireless connected computers in the pub.

March 2009

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31