Several years ago, we digitised some papers relating to the explorer Matthew Flinders and put them online as the Flinders archive. I’ve been looking at that site with an eye to redeveloping it. Firstly, the markup needs overhauling to bring it up to the same standard as sites like the prints and drawings catalogue. Secondly, it’d be nice to come up with a good model for publishing written papers online. Most of the two million or so objects in the Maritime Museum’s collections are bits of paper; log books, letters, diaries, crew lists and who knows what else. There’s a copy of the Declaration of Independence, letters from Napoleon and Nelson’s last letter to his daughter. You can see photos of these documents online, but the original text isn’t available.
HTML isn’t well-suited to marking up written papers. It doesn’t have the tags to describe, semantically, the structure and contents of a written letter, for example. So the papers on the Flinders site were digitised by transcribing the text in TEI-Lite. TEI is an XML markup language designed specifically to represent the written word – poetry, plays, novels, letters etc. Consequently, we have fairly rich semantic versions of the Flinders papers stored on our database server, but that semantic information is lost in the HTML versions of the papers. I’m interested in coming up with HTML templates for the papers which preserve, as far as possible, the information stored in the TEI documents.
Microformats and tagging could help here. TEI has the <rs> tag to mark up references to things (people, places, ships, books, constellations, anything really) in a text. The Flinders papers use <rs> extensively to markup the people, places, vessels and miscellaneous glossary terms referred to in the correspondence. For instance, a reference to a person might read <rs type="person" key="25">my dear wife</rs>
or a reference to a ship might read <rs type="vessel" key="27">this sturdy vessel</rs>
. The type
attribute indicates what type of thing we’re referring to. The key
attribute identifies which specific thing we’re referring too. In this case, it might be person number 25, or ship number 27, in our database.
My first idea for opening up the archive is to replace the numerical keys with wikipedia-style tags eg. <rs type="person" key="Ann_Chappelle">my dear wife</rs>
or <rs type="vessel" key="Investigator">this sturdy vessel</rs>
. I think human-readable tags make it easier to understand the references, rather than having to go look up which ship is vessel number 27. Tagging like this will also allow us to take advantage of the rel-tag microformat in our HTML, because we can transform our TEI <rs> tags into HTML link anchors: <a rel="tag" href="/tags/Ann_Chappelle">my dear wife</a>
. Even better, we can take the TEI type
attribute and put that in the tag URL to indicate what type of tag this is: <a rel="tag" href="/tags/person/Ann_Chappelle">my dear wife</a>
. This is pretty cool, I think – we’re now indicating, in our HTML, that this letter refers to a person, and that person is called Ann Chappelle. I can use the tag to link together everything that refers to Ann, whether she’s referred to indirectly as ‘my dear wife’, directly as ‘Ann Flinders’ or by her maiden name ‘Ann Chappelle’. Additionally, I’ve opened up the data to be used by any tool that understands the rel-tag microformat. Finally, the tagging scheme is simple and looks like it will extend to other archives. For example, we may want to digitise papers from the Royal Observatory archives, which refer to the names of planets, asteroids or constellations. Or we might have to tag references to books. We could do this by adding new types to our tag URLs eg. /books/De_Revolutionibus
or /astronomy/Mars
.
Perhaps this is what I should I talk about at SemanticCamp – opening up the semantic information stored in museum collections.
I think this would be a good presentation. If you create any handouts or anything, maybe you could post them on your blog so losers like me can see them?