Digital information is different

This is beautifully done, and hits a few nails on the head:

Merlin has a few worthwhile comments on the strengths of paper. Also missing, from my point of view, is the fact that a lot of what we do with computers now (pretty much everything that's off the web, for a start) needlessly recreates the limitations of paper and adds a few new problems for good measure. Now where did I put that file ...

Wilbur lives!

Hooray, Wilbur lives! Jonathan just noticed that there had been an update made over the weekend after a long silent stretch since late 2002. Since we're using it in our project here, this is good news.

Elaboration vs. Layering

In an ongoing discussion of tuple spaces is embedded a discussion of the nature of HTTP as a transfer, rather than transport, protocol, or as an application protocol, or …

Patrick Logan makes a point that is easier to get hold of:

The difference is this: the HTTP interface is vague and the Linda interface is specific. Linda has precise, simple semantics. The possible range of behaviors exhibited in Linda-based systems benefit from being layered on *top* of those precise, simple semantics. [Patrick Logan: REST and Linda for Distributed Coordination: Elaboration vs. Layering]

This sounds to me very like the difference between XML and RDF as starting points for building data models. RDF provides much more specific semantics, which means that, assuming applications of RDF stick with those semantics, you can get much further with standard tools. Note that I am not in this taking a stance on REST vs. tuple spaces; I don't know enough about either. It's just that the distinction Patrick expresses strikes me as familiar.

Aside: The "assuming applications of RDF stick with those semantics" qualification is something I hadn't thought about until Jan Grant commented that my proposed approach to using RDF in MDF was at odds with RDF semantics. I still need to learn more about those semantics in order to avoid trampling on them.

As a general rule then, a layer should aim to support further layers by being precise, simple, but flexible, not by being vague and requiring each specialisation to make elaborations.

UI is Better than AI

In an article about the differences between desktop and server based RSS aggregators, William Grosso coins the phrase

UI is Better than AI.

which is a pleasingly catchy encapsulation of an idea which I was thinking about during my first weblogging experiment, and fits with a theme which I have returned to recently. The first of the above linked articles includes two quotes from a paper in the Journal of Hydraulic Research (the IAHR web site is a royal PITA, with Frames-and-Flash based navigation which doesn't work in Konqueror, and doesn't allow deep linking without picking frames out of context). Since that post is on a weblog which in theory could disappear at any time, I repeat them here.

Knowledge based systems may support certain tasks, but if the task is restructured, the need of much of the knowledge about the complexity might become obsolete. Much of the knowledge which is still needed now is simply an artefact of certain solution methods or of the use of certain computer programs. Modification of the solution process and improvement of the tools tend to make the task less knowledge-intensive. [p. 91, van Zuylen, Dee, Mynett, Rodenhuis, Moll, Ogink Most, Gerritsen, Verboom (1994) "Hydroinformatics at Delft Hydraulics", J. Hydraulic Research, Extra Issue: Hydroinformatics]

Tools were proposed to provide knowledge in a suitable form ... Once it became clear what kind of tools could be implemented, the developer and the expert/user restructured the task in such a way that it became easier, less complex and less knowledge-intensive. A user interface was then developed for this task. Clearly, a straightforward development of a user interface containing a knowledge base would have resulted in a sub-optimal product, since the product would have supported an obsolete task. [p. 93, van Zuylen, Dee, Mynett, Rodenhuis, Moll, Ogink Most, Gerritsen, Verboom (1994) "Hydroinformatics at Delft Hydraulics", J. Hydraulic Research, Extra Issue: Hydroinformatics]

Data Emergence and not throwing information away

Danny Ayers points out a neat summary by Robin Good of ideas about "Data Emergence".

The phrase "data emergence" really only captures one aspect of the process. Something can emerge only after a critical mass of data has been collected. This creates a tricky catch 22; in order for a database to be "aided through normal, selfish use" (Dan Bricklin, cited by Jon Udell, cited in turn by Robin Good), it is necessary for a database to exist to be used in the first place. The rule is also harder to apply in situations where any "database" such as it is is distributed.

Which is the sort of situation which occurs in supporting the development of (simulation) models of the physical environment and the associated data processing, about which I was talking this week in Bristol (abstract, slides [PDF, 320Kb], for what it's worth) and about which I will talk again next week in Delft (I'll post the slides as updated for that when I get back, and I need to start writing this into a paper for Hydroinformatics 2004 soon).

In that talk I said, "Don't throw information away." Throwing information away is exactly what happens all the time in model development and application activities at the moment (it happens everywhere, but lets stick with my pet case study). Raw data (for example from remote sensing or ground survey) are processed, and the processed data used for some purpose, but the processing steps and reference to the raw data from which the processed data are derived are discarded (or "not kept", but in this case sins of omission and commission can I think be conflated without concern since it is clear to everyone that this information is, or will be in the future, critical).

Often even the raw data are discarded. I think this is true of at least some of the weather radars operated by the Met. Office here in the UK. In this case it is hang over from days of yore when storage on that scale was beyond the reach even of a national meteorological office, but it needs fixing fast.

Closer to home, in the last progress meeting of the Next Generation of Flood Inundation Models project, it was observed that some of the data aqcuired for the project is billed as "geo-referenced", but it is quite unclear what is meant by this and quite unlikely that any reasonably strict definition geo-referenced could be applied. This example draws attention to the fact that linguistic descriptions of processing steps are still not enough; the resulting descriptions will most likely, if they are made at all, be minimal and questionable to the degree that they are actually content-free.

The frustrating thing here is that these processing steps are almost invariably applied with the aid of software, and that software could, if the appropriate frameworks were in place, keep track of this information without placing additional demands on the user (and so without being ambushed by Doctorow's Metacrap straw men).

Think of a persistent undo facility, where each data set carries its own processing history with it. The undo analogy isn't perfect, since in many cases knowing a forward transformation does not imply an ability to reverse it, and (as was emphasised to me after I made over-optimistic claims regarding the rate of decrease of mass storage costs without allowing for increasing demand) keeping each intermediate step is still prohibitively expensive. It is however plausible that checkpoints could be kept, and intermediate stages could be recreated by following the processing sequence forward from the nearest checkpoint.

These trails should be firmly attached to the data set the derivation of which they describe, so that when someone is handed that data set in twenty years time the trail is still there. This might at first seem to be at odds with Earl Mardle's comment on an earlier metadata-related post of mine.

Precisely. And for it to be worth anything, it must also be held separately from the original data. That way, others can contribute to the development of the metadata or annotation, of the document. [Earl Mardle: Metadata As Web Service]

Of course it isn't at odds really. If (no small if, but lets not get stuck here for now) I can refer to a given data set using a URI, then I can say things about it anywhere I want to, whether I own the data set or not, as can anyone else. I can decide which of the statements other people have made about that data set (those which are made visible to me) I want to make use of. But it is essential that if I have access to the data itself, I have access to information about its provenance, and it makes little sense to do other then trust the supplier of the data to supply that.

Code as a repository of knowledge

I added the "code is a repository of knowledge, but it's impossible to get that knowledge back out" theme to my ever growing collection of soap boxes a while ago. I've just come across the thought in a rather different context.

Even though code makes for a description of solutions in a fairly objective language, the bottom line is that it makes for a poor repository of knowledge when that language becomes an endangered species. [Lambda the Ultimate]

It's good to see the idea crop up, but this presentation only goes part of the way. Chris seems to be of the opinion that the problem would be solved if we could translate code written in an "endangered" language (Common Lisp to Dylan would seem to be translating from the less to the more endangered language, but that's not the point).

Language translation efforts seem to rarely succeed. I've had a few successes here and there, but an intense familiarity with both languages as well as the software is required. [Lambda the Ultimate]

I'm not convinced. In the case of models of environmental processes, for example, a huge amount of knowledge is embedded in software implementations of those models, and reading the code is more or less the only way of getting hold of that knowledge. Descriptions of models tend to be ambiguous or incomplete, and they rarely describe in detail the implementation of the model as actual, running, code. The process of translating a mathematical model into a numerical one, and on to functioning code, generally results in code which completely masks the structure of the model.

Code is a dreadful knowledge representation format, not just because a language dying out leaves you with no way to make use of the thus represented knowledge, but because there is no good way to recover the knowledge from its representation back into a human brain where it can do some good.

This is why I find syntactic abstraction exciting. It is also why I think that using Communicating Sequential Processes has real potential as a software structuring technique for models; isolate the sequential bits, and then describe the composition in declarative terms.

Artificial Intelligence? Pah.

Long ago Dad introduced me to a quote, which he believes to have been first uttered by a head of the AI group at the University of Edinburgh.

Artificial it may be, Intelligence it most certainly isn't. [Source unknown]

I have in the past found Google to be quite effective at locating sources of quotes for me. In this case it is of no assistance whatsoever. If anyone has any leads on this, I'd love to know. If not, you heard it here first.

Spread your wings and fly, little meme.

Confusing a thing with a web page about it

I notice the following on Libby Miller's weblog, referring to the RSS+events module.

I'm really pleased to see that it's been updated so that the event is a 'thing' in itself and isn't confused with the webpage describing it … [Libby Miller]

Which I am noting here mostly so I can find it again if necessary, but also because hints at something which has been bugging me about RDF, but which is (I think) more cleanly dealt with in Topic Maps.

If you use an URI to represent a non-addressable object like a person, then it better not be the URI of an addressable object (like a web page) because if it is then you can't distinguish between statements about the person and statements about the web page. In Topic Maps you can talk about the resource at a URI and the thing indicated by a URI unambiguously, so for example I could make statements about my official web page and also make statements using the URI of that page as a subject indicator for me.

Ah. In her previous post Libby comments more fully:

My experiments with RDFical and RSS 1.0 use foaf:topic to separate the RSS 1.0 feed item (with its url) from the event itself (which might have a homepage, but is not itself a url). This issue is analogous to people and urls. People are not webpages though they may have homepages. Events are not webpages, though they may have homepages and other pages about them.

Chris was arguing that the rss link did not have to be a url, but could be a non-url uri, and therefore not confusable with a webpage. Ok, that seems more reasonable, although I worry that people will in fact tend to use actual urls of webpages, especially because that's what RSS is designed for. [Libby Miller]

I think that when it comes to naming models the worry expressed here isn't an issue. There is no existing tradition of having a well defined web page per model (not distinct from modelling tool) or model implementation, so a format which declares that models shall be given URIs which are not URLs of web-retrievable objects should be fine.

Exploratory Modelling

An interesting over coffee conversation just had. The context was how one might go about constructing models of geomorphological systems, and how the qualitative descriptions of geomorphologists might be tested and informed by interactive development of quantitative expressions of those descriptions.

Grep tells me that I didn't use the phrase exploratory modelling in my PhD thesis, which I thought I had. Let's post it here.

Google tells me that it's not new. Which is great; people to talk to. Why didn't I run this search before? Zarine Kemp looks to be doing some interesting work, and has some publications I need to follow up.

Continue reading "Exploratory Modelling" »

Comments from Ari Jolma on my Open Source paper

My recent paper on the potential value of open source software in Hydroinformatics has stimulated another response (in addition, that is, to those published in the Journal from Profs. Mike Abbott and Jean Cunge). I will respond to Ari Jolma's comments here quoting, with permission, from his email.

I read with great interest your paper in J. Hydroinformatics. There surely is a need for free software and open and useful standards in Hydroinformatics.

The word "useful" in this sentence, apparently so innocuous, is critical, I think. I managed to resist all but the briefest comment on this subject in my thesis, because it was clearly off topic, but that brief comment I did feel compelled to make. Interoperability, Not Standards is Clay Shirky's mantra. Shirky was talking about premature standardisation in the context of Peer to Peer software. His conclusions seem valid for the hydroinformatics world too, however.

Continue reading "Comments from Ari Jolma on my Open Source paper" »

March 2009

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31