"Owing to the neglect of our defences and the mishandling of the German problem in the last five years, we seem to be very near the bleak choice between War and Shame. My feeling is that we shall choose Shame, and then have War thrown in a little later, on even more adverse terms than at present."Winston Churchill in a letter to Lord Moyne, 1938 [Gilbert 1991]
Surfing the Web, you find an announcement for a conference. You'd like to click the mouse and have three entries made automatically in your electronic calendar: the abstract deadline, the paper deadline, and the conference itself. Unless your computer understands natural language, this will never happen because there is no way for the author of the conference announcement to encode sufficient structural information in HTML.
You are in a library reading the Bible, a Harold Robbins novel, Hamlet, The Shipping News, TIME magazine, and Encyclopedia Brittanica. They all look different. They all look good thanks to the $millions invested in graphic design on the part of the publishers of the respective items. You are surfing the Web reading Travels with Samantha, the Bible, Werdna's Humor Archive, and The Temptation of Saint Anthony. They all look the same. Of course, you can change their appearance to some extent by editing resource files on your UNIX machine or visiting dialog boxes on the Macintosh. However, do we really believe it is more efficient for each of 20 million people to spend five minutes designing a document badly or for one professional to spend a few days designing a document well?
HTML's impoverished formatting capabilities frustrates the would-be designer of beautiful documents. HTML's lack of structural tags frustrates the would-be provider of more advanced browsers.
[Note: a group of people at Los Alamos National Labs has already developed practical extensions of TeX and TeX viewers that allow documents with full TeX formatting and full hypertext linking to arbitrary URLs.]
Currently, though, our methodology for extending HTML is backwards. Based on our experiences with other formatting languages, we sit down and figure out what are the most typically needed commands and argue for their inclusion in HTML. People with a stake in keeping the language simple, either for aesthetic reasons or because they don't want to further snarl the pile of C code that serves them as a Web client, fight to keep these commands out.
Let's choose a set of 100 documents in advance and decide that we are going to put enough richness into HTML to capture the design intent in at least 98 of them. For example, crack open a copy of The English Patient [Ondaatje 1992]. Although its narrative style is about as unconventional as you'd expect for a Booker Prize winner, it is formatted very typically for a modern novel. Sections are introduced with a substantial amount of whitespace (3 cm), a large capital letter about twice the height of the normal font, and the first few words in small caps. Paragraphs are not typically separated by vertical whitespace as in Mosaic but by their first line being indented about three characters. (This makes dialog much easier to read than in Mosaic, by the way, where whitespace cuts huge gaps between short sentences and breaks the flow of dialog.) Chronological or thematic breaks are denoted by vertical whitespace between paragraphs, anywhere from one line's worth to a couple of centimeters. If the thematic break has been large, it gets a lot of whitespace and the first line of the next paragraph is not indented. If the thematic break is small, it gets only a line of whitespace and the first line of the next paragraph is indented.
The English Patient is not an easy book to read in paperback. It would become, however, a virtually impossible book to read in Mosaic because neither the author's nor the book designer's intents are expressible in HTML. As the author of Travels with Samantha, I have exactly the same problem. It looks great in PostScript and is formatted very similarly to my paperback copy of The English Patient. Our fileserver handles as many as 100,000 requests per day for pieces of Travels with Samantha and none of those pieces give my readers the quality of experience they'd get reading a version hastily printed out from the simplest word processor. We should demand better from two $30,000 workstations talking to each other over 45 Mbit/second T3 lines.
People argue that HTML isn't a formatting language. It is somehow supposed to be a structural representation of a document. Yet there is no tag for a thematic break, large or small. There is no way to indicate a section break. In short, even the simple requirements of fiction are utterly beyond HTML Level 3's capabilities, never mind what we'd need to automate the processing of conference announcements.
Once we have enough new tags in HTML to represent the author's intent in 98 of our 100 previously selected documents, then it is time to ponder the best way to capture the book designer's intent. If we are determined not to clutter up HTML with formatting directives, then surely we can add a STYLE-SHEET tag in the HEAD so that people who are willing to spend a day or two designing a book nicely can save the other 20 million people on Internet the trouble of doing it themselves.
In the long run, people are not going to accept an expensive system that is inferior in many ways to a $5 paperback book. Eventually Web documents are going to contain formatting information. We might as well sit down with our 100 documents plus manuals for LaTeX, Adobe Acrobat, Frame's internal format, etc. and specify a rich system for capturing author and designer intent. Six months of torture for Web client programmers will ensue, but that is better than the Web documents and clients being out of sync six times in the next decade.
This wouldn't work because the committee could never think of all the useful fields. Five years from now, people are going to want to do new, different, and unenvisioned things with the Web and Web clients. Thus, a decentralized revision and extension mechanism is essential for a structure system to be useful.
There is a deeper reason why this wouldn't work. Nobody would be able to write parsers and user interfaces for it. If a user is developing a Web document, does he want to see a flat list of 10,000 fields and go through each one to decide which is relevant? If you are programming a parser to do something interesting with Web documents, do you want to deal with arbitrary combinations of 10,000 fields?
Each message type also has an associated list of suggested types for a reply message. For example, the suggested reply type for MEETING ANNOUNCEMENT is REQUEST FOR INFORMATION. Most importantly, the decomposition of message types into a kind-of hierarchy allows the automatic generation of helpful user interfaces. For example, once the system knows that the user is writing a LENS MEETING ANNOUNCEMENT, that determines which fields are offered for filling and what defaults are presented. Fields having to do with software bugs or New York Times articles are not presented and fields such as PLACE and TIME may be filled in with the usual room and time.
What did Malone's team learn from this?
<meta name="type" content="conference-announcement"> <meta name="conference-name" content="Second Int'l WWW '94"> <meta name="conference-location-brief" content="Chicago"> <meta name="conference-location-full" content="Ramada-Congress Hotel, 520 South Michigan Avenue, Chicago, Illinois, USA"> <meta name="conference-date-start" content="17 October 1994"> <meta name="conference-date-end" content="20 October 1994"> <meta name="conference-abstracts-deadline" content="10 August 1994"> <meta name="conference-papers-deadline" content="15 September 1994">would be part of the description for our conference and provides enough information for entries to be made automatically in a user's calendar.
It might not be pretty. It might not be compact. But it will work without causing any HTML level 2 client to choke.
There are a few obvious objections to this mechanism. The most serious objection is that duplicate information must be maintained consistently in two places. For example, if the conference organizers decide to change the abstracts deadline from 10 August to 15 August, they'll have to make that change both in the META element in the HEAD and in some human-readable area of the BODY.
An obvious solution is to expose the field names and contents to the reader directly, as is typically done with electronic mail and as is done in [Malone 1987]. When Malone added semiformal structure to hypertext [Malone 1989], he opted to continue exposing field names directly to users. However, that is not in the spirit of the Web. Stylistically, the best Web documents are supposed to read like ordinary text.
A better long-term solution is a smart editor for authors that presents a form full of the relevant fields for the document type and from those fields generates human-readable text in the BODY of the document. When the author changes a field, the text in the BODY changes automatically. Thus, no human is ordinarily relied upon to maintain duplicate data.
Whatever mechanism we propose, therefore, had better allow for an organization to develop further specialized types that facilitate clever processing and presentation. At the same time, should one of these hyperspecialized documents be let loose on the wider Internet, it should carry some type information understandable to unsuspecting clients. Once mechanism for doing this is the inclusion of an extra type specification:
<meta name="type" content="lanl-acl-conference-announcement"> <meta name="most-specific-public-type" content="conference-announcement">In this case, the Los Alamos National Laboratory's Advanced Computing Laboratory has concocted a highly specialized type of conference announcement that permits extensive automated processing by Web clients throughout Los Alamos. However, should someone at MIT be looking at the conference announcement, his Web client would fail to recognize the type LANL-ACL-CONFERENCE-ANNOUNCEMENT and look at the MOST-SPECIFIC-PUBLIC-TYPE field. As CONFERENCE-ANNOUNCEMENT is a superclass of LANL-ACL-CONFERENCE-ANNOUNCEMENT, all the things that the MIT user's client is accustomed to doing with conference announcements should work with this one.
Nonhierarchical inheritance (also known as "multiple inheritance") is also important so that duplicate type hierarchies are not spawned. For example, the fact that a document is restricted to a group or company might possibly apply to any type of document. Should there be two identical trees, one rooted at BASIC-DOCUMENT and the other at BASIC-INTERNAL-DOCUMENT? Then we might imagine documents for which there is an access charge. Now we just need four identical trees, rooted at BASIC-FREE-DOCUMENT, BASIC-METERED-DOCUMENT, BASIC-INTERNAL-FREE-DOCUMENT, BASIC-INTERNAL-METERED-DOCUMENT. There is a better way and it was demonstrated in the MIT Lisp Machine Flavor system (a Smalltalk-inspired object system grafted onto Lisp around 1978): mixins. Mixins are orthogonal classes that can be combined in any order and with any of the classes in the standard kind-of hierarchy. Here are some example mixin classes:
If there are N mixins recognized in the public type registry, we might have to have 2^N classes for every class in the old kind-of hierarchy. That's one for every possible subset of mixins, so we'd have classes like TRAVEL-MAGAZINE, TRAVEL-MAGAZINE-RESTRICTED, TRAVEL-MAGAZINE-DRAFT, TRAVEL-MAGAZINE-DRAFT-RESTRICTED, etc. This doesn't seem like a great improvement on the 2^N identical trees situation.
However, if we allow documents to specify their fundamental type and mixins separately
<meta name="type" content="travel-magazine"> <meta name="mixin-types" content="restricted-mixin">and build the final type at runtime in both the HTTP server and the Web client, then we need only have one hierarchy plus a collection of independent orthogonal mixins. This presents no problem for programmers using modern computer languages such as Smalltalk and Common Lisp that allow type definition at run-time, but programs implemented in primitive languages (e.g., C++) that have purely static types are going to essentially need their own dynamic type system.
We know then that we need multiple inheritance and distributed extensibility. A standard Internet approach to distributed maintenance of a hierarchy is found in the Domain Name System (DNS), where authority for a zone is parcelled out and that authority includes the ability to parcel out subzones [Stevens 1994; Mockapetris 1987a ; Mockapetris 1987b]. A DNS-style might seem like overkill initially and would result in some delays for pioneer users of document types because without a substantial local cache, document type queries would have to be sent across the Internet for practically every Web document viewed.
Regardless of how the hierarchy is maintained, developing the initial core taxonomy is a daunting tasks. Fortunately, librarians have been at it for hundreds of years and have done most of the work for us. The USMARC, ****, [**** add references] are carefully thought out systems and should serve as the basis for our tree.
Malone, Thomas W., Grant, Kenneth R., Lai, Jum-Yew, Rao, Ramana, and Rosenblitt, David 1987. "Semistructured Messages are Surprisingly Useful for Computer-Supported Coordination." ACM Transactions on Office Information Systems, 5, 2, pp. 115-131.
Malone, Thomas W., Yu, Keh-Chaing, Lee, Jintae 1989. What Good are Semistructured Objects? Adding Semiformal Structure to Hypertext. Center for Coordination Science Technical Report #102. M.I.T. Sloan School of Management, Cambridge, MA
Mockapetris, P.V. 1987a. "Domain Names: Concepts and Facilities," RFC 1034
Mockapetris, P.V. 1987b. "Domain Names: Concepts and Facilities," RFC 1035
Ondaatje, Michael 1992. The English Patient. Vintage International, New York
Stevens, W. Richard 1994. TCP/IP Illustrated, Volume 1: The Protocols. Addison-Wesley, Reading, Massachusetts