Update (06/11/2003):: It's broken already. It's choking on something in this grant page.
Update (06/11/2003):: OK, that's fixed now. There's a new TODO as well, though.
Well, a weekend marathon learning and coding session comes to a reasonably satisfactory close, excepting the fact that there were a pile of other things I needed to be doing.
Note to UK academics who don't know what RSS is: read on, I think you'll be glad to find out, and I explain it towards the end. It'd be great if anyone who experiments, either with RSS and aggregators in general, or with the EPSRC latest grant awards feed in particular, could let me know how they get on -- positive or negative -- either by email or, better, by leaving a comment on this item.
There is now a trial RSS 1.0 feed
of the latest grants awarded by the EPSRC. The content is scraped (using Bigloo, SXML, and HtmlPrag) and rendered as a valid RSS 1.0 feed. The RSS contains some ad hoc extensions; these are aretfacts of experiments with converting the EPSRC grants information into more general RDF.
TODO:
- The XML needs to be pretty printed. As far as I can tell, SXML doesn't actually have a working SXML->XML convertor which handles namespaces, so there's a bit of kludge in there, and the easiest way to get valid RDF out was to avoid using whitespace at all!
- SXML->XML can't generate CDATA blocks as far as I can see, either, so the encoded description is just the same paragraph as the uncoded. I want to get all of the information (start and end dates, people, value) into that version so it comes up formatted nicely in a standard RSS reader.
- The use of rdf:about URIs is broken. At the moment the item URI is the URI of the page for that grant on the EPSRC web site, which I don't think is right.
- The structured data included with each item needs refining. I'd like to produce foaf:Projects for each grant, foaf:Persons for each person, and foaf:Organizations for each institution and department, with appropriate associations between them.
- The script at the moment just grabs the last 15 announcements. This follows the RSS spec recommendation, but todays' new batch of Industrial Training grants shows that it failes miserably in this situation. Of course, you might argue that Industrial training grants are not of great interest, so I guess I need to add some filtering. Perhaps just filtering out everything which is not a "Standard Research" grant. This was a problem with the MyRSS feed of grants, too.
All of which can wait, the basics are there.
Incidentally, I have been using MyRSS for this, which worked, but provided only grant titles.
OK, so what is RSS?
In short, a news syndication mechanism. Web sites can produce a list of new items in a special format (lets not go into the details of that here, it's not important), which a news aggregator can read. Your aggregator goes out to these sites at regular intervals and checks for new news. It's a bit like subscribing to an email newsletter, except it's easier to unsubscribe, and you don't need to give anyone your email address (since your computer, or a web service, goes out and grabs the news rather than the remote site recording your interest).
There are RSS feeds from a lot of main stream news sources, such as BBC News [
,
,
,
,
,
,
], Guardian Unlimited
. Weblogs such as mine
usually provide them. Slowly technical publishers are catching on
.
Aggregators, then. I urge you to look into this. The easiest way to investigate RSS is to use Bloglines. This is an online service, so you still use a web browser to check up on news, but you only need to look in one place however many feeds you subscribe to. Desktop aggregators include FeedDemon and NewsGator; Newsgator is a plugin for Microsoft Ooutlook. Mac users might try NetNewsWire.
There are directories of RSS feeds, too, including Syndic8 and Feedster.
MyRSS allows you to create RSS feeds from news web pages which don't provide them by simply giving it the URL of the page. MyRSS feeds I have created are:
- Environment Agency News

- IST Results - Promoting Innovation for the Information Society

- SEPA What's New

- SEPA Newsroom | Releases/Statements | 2003

- SEPA Initiatives

- EPSRC Web Site - News

- EPSRC Web Site - Publications

- EPSRC Web Site - What's New

- EPSRC Web Site - Press Releases

- CORDIS: News service: Home

- CORDIS: News service: Forthcoming Events

- CORDIS: News service : Interviews

- CORDIS: Press Service: Press Releases Home

- CORDIS MSS: UK: What's New ?

- Defra, UK - News stories index

- Defra, UK - Weekly focus index 2003

- Defra, UK - Defra statements

- Defra, UK - About Defra - Ministers - Ministers' statements

One last thing. For those of you who live in London, have a look at the londonartaggregator 
OK, I think I almost understand that. What do I have to do to my web site to make it produce RSS feeds?
Posted by: Bill Harvey | October 05, 2003 at 09:58 PM
With typepad? In the typepad interface, select the weblog, then "Configure", then "Publicity & Syndication", and there's an option at the bottom of the page.
In actual fact, in your case I think it's on already; it's just not advertised. For that you need to go to "Design", "Content", and make sure "Syndicate Link" is selected. Then use "Order" to decide where to put it on the page.
Producing it from a web site which doesn't generate "what's new" or news lists automatically is probably not a realistic option.
Posted by: Hamish Harvey | October 05, 2003 at 11:08 PM