100th Anniversary Boston Marathon (1996).


for Philip and Alex's Guide to Web Publishing

Application Server
see "Middleware"
Released in early 1995 as "NaviServer", AOLserver remains the most powerful Web server program on the market (and it is free). It is a multi-threaded server that provides a lot of support for connecting to relational database management systems. Nearly all of the programming examples in this book are written in the AOLserver Tcl API. AOLserver is documented at www.aolserver.com.
The world's most popular Web server. Originally not nearly as powerful as AOLserver, Apache had one huge advantage: the source code was available right from the start (NaviServer was initially a commercial product). This, coupled with the failure of Windows NT to work reliably and the failure of Netscape to have any clue about what Web publishers need, has made Apache dominant. See the server chapter for some discussion of Apache's pros and cons.
Application Programming Interface. An abstraction barrier between custom/extension code and a core, usually commercial, program. The goal of an API is to let you write programs that won't break when you upgrade the underlying system. The authors of the core program are saying, "Here are a bunch of hooks into our code. We guarantee and document that they will work a certain way. We reserve the right to change the core program but we will endeavor to preserve the behavior of the API call. If we can't, then we'll tell you in the release notes that we broke an old API call."
Active Server Pages, developed by Microsoft. This is the standard programming system for Web sites built on Windows NT. It is bundled with Internet Information Server (IIS) when you buy the Windows NT Server operating system. The fundamental idea is that you write HTML pages with little embedded bits of Visual Basic that are interpreted by the server. See the server programming chapter for more info.
Cable Modem
A cable modem is an Internet connection provided by a cable TV operator, typically with at least 1.5 Mbits per second of download bandwidth (50-100 times faster than modems that work over analog telephone lines).
Computer systems typically incorporate capacious storage devices that are slow (e.g., disk drives) and smaller storage devices that are fast (e.g., memory chips, which are 100,000 times faster than disk). File systems and database management systems keep recently used information from the slow devices in a cache in the fast device.
Common Gateway Interface. This is a standard that lets programmers write Web scripts without depending on details of the Web server program being used. Thus, for example, a Web service implemented in CGI could be moved from a site running AOLserver to a site running Apache. See the server programming chapter.
In the 1960s, computers were so expensive that each company could only have one. "The computer" ran one program at a time, typically reading instructions and data from punch cards. This was batch processing. In the 1970s, that computer was able to run several programs simultaneously, responding to users at interactive terminals. This was timesharing (it would be nice if modesty prevented me from noting that this was developed by my lab at MIT circa 1960). In the 1980s, companies could afford lots of computers. The big computers were designated servers and would wait for requests to come in from a network of client computers. The client computer might sit on a user's desktop and produce an informative graph of the information retrieved from the server. The overall architecture was referred to as client/server. Because of the high cost of designing, developing, and maintaining the programs that run on the client machines, Corporate America is rapidly discarding this architecture in favor of Intranet: Client machines run a simple Web browser and servers do more of the work required to present the information.
Community Site
A community site exists to support the interaction of an online community of users. These users typically come together because of a shared interest and are most vibrant when there is an educational dimension, i.e., when the more experienced users are helping the novices improve their skills.
When storing information in digital form, it is often possible reduce the amount of space required by exploiting regular patterns in the data. For example, documents written in English frequently contain "the". A compression system might notice this fact and represent the complete word "the" (24 bits) with a shorter code. A picture containing your friend's face plus a lot of blue sky could be compressed if the upper region were described as "a lot of blue sky". All popular Web image, video, and sound formats incorporate compression.
Data Model
A data model is the structure in which a computer program stores persistent information. In a relational database, data models are built from tables. Within a table, information is stored in homogeneous columns, e.g., a column named registration_date would contain information only of type date. A data model is interesting because it shows what kinds of information a computer application can process. For example, if there is no place in the data model for the program to store the IP address from which content was posted, the publisher will never be able to automatically delete all content that came from the IP address of a spammer.
Dynamic Site
A dynamic site is one that is able to collect information from User A, serve it back to Users B and C immediately, and hide it from User D because the server knows that User D isn't interested in this kind of content. Dynamic sites are typically built on top of relational database management systems because these programs make it easy to organize content submitted by hundreds of concurrent users. An example of a simple dynamic site would be a classified ad system.
Electronic Document Interchange (EDI)
A standard for exchanging business documents, such as invoices and purchase orders.
World's most powerful text editor, written by Richard Stallman (RMS) in 1976 for the Incompatible Timesharing System (ITS) on the PDP-10s at MIT. Emacs has been subsequently ported to virtually every kind of computer hardware and operating system between 1976 and the present (including the Macintosh, Windows 95/NT, and every flavor of Unix). Good programmers tend to spend their entire working lives in Emacs, which is capable of functioning as a mail reader, USENET news reader, Web browser, shell, calendar, calculator, and Lisp evaluator. Emacs is infinitely customizable because users can write their own commands in Lisp. You can find out more about Emacs at ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-519A.pdf (Stallman's 1979 MIT AI Lab report), at www.gnu.org (where you can download the source code for free), or by reading Learning Emacs (Cameron et al 1996; O'Reilly). If you want to program Emacs and then you'll want Writing Gnu Emacs Extensions (Bob Glickstein 1997; O'Reilly).
A computer that sits between a company's internal network of computers and the public Internet. The firewall's job is to make sure that internal users can get out to enjoy the benefits of the Internet while external crackers are unable to make connections to machines behind the firewall.
Flat File
A flat-file database keeps information organized in a structured manner, typically in one big file. A desktop spreadsheet application is an example of a flat-file database management system. These are useful for Web publishers preparing content because a large body of information can be assembled and then distributed in a consistent format. Flat-file databases typically lack support for processing transactions (inserts and updates) from concurrent users. Thus, collaboration or ecommerce Web sites generally rely on a relational database management system as a back-end.
Graphical Interchange Format. Developed in 1987 by CompuServe, this is a way of storing compressed images with up to 256 colors. It became popular on the Web because it was the only format that could be displayed in-line by the first multi-platform Web browser (NCSA Mosaic). The use of GIF versus JPEG is discussed in the images chapter.
Hyper Text Markup Language. Developed by Tim Berners-Lee, this specifies a format for the most popular kind of document distributed over the Web (via HTTP). Documented sketchily in my HTML chapter, documented badly at http://www.w3.org, and documented well in HTML: The Definitive Guide (Musciano and Kennedy 1998; O'Reilly).
Hyper Text Transfer Protocol. Developed by Tim Berners-Lee, this specifies how a Web browser asks for a document from a Web server. Question such as "how does a server tell the browser that a document has moved?" or "how does a browser ask the time that a document was last modified?" may be answered by reference to this protocol, which is documented badly at http://www.w3.org and documented well in Web Client Programming (Clinton Wong 1997; O'Reilly).
Internet Information Server. A Web server program that is included by Microsoft when you purchase the Windows NT Server operating system. As Larry Ellison notes, this is not the same as "free". Rather than compete with other vendors of Web server programs, Microsoft puts its product into the operating system that everyone has to buy (unless they free themselves with Linux) and then raises the price of the operating system. The best part about IIS is Active Server Pages (ASP), described in the server programming chapter. The worst part about IIS is the comparative unreliability of Windows NT.
Java is first a programming language, developed by Sun Microsystems around 1992, intended for use on the tiny computers inside cell phones and similar devices. Java is second an interpreter, the Java virtual machine, compiled into popular Web browsers such as Netscape Navigator. Java is third a security system that purports to guarantee that a program downloaded from an untrusted source on the Internet can run safely inside the interpreter. Java is the only realistic way for a Web publisher to take advantage of the computing power available on a user's desktop. Java is generally a bad language for server-side software development (see the server programming chapter). For more background on the language, see the Java chapter from Database Backed Web Sites at http://www.photo.net/wtr/dead-trees/53008.htm.
Joint Photographic Experts Group. A bunch of guys who sat down and design a standard for image compression, conveniently titled "IS 10918-1 (ITU-T T.81)". This standard works particularly well for 24-bit color photographs. C-Cube Microsystems came up with the JFIF standard for encoding color images in a file. Such a file is what people commonly refer to as "a JPEG" and typically ends in ".jpg" or ".jpeg". See the images chapter for tips on producing JPEGs for the Web. See www.jpeg.org for more about the standard.
A free version of the Unix operating system, primarily composed of tools developed over a 15-year period by Richard Stallman and Project GNU. However, the final spectacular push was provided by Linus Torvalds who wrote a kernel (completed in 1994), organized a bunch of programmers Internet-wide, and managed releases. Currently, because it can be installed on any Wintel box, Linux is the most likely vehicle by which users can free themselves from the Microsoft monopoly (see http://www.photo.net/philg/humor/bill-gates.html). I discuss Linux in the server chapter. Linux is free but you'll save yourself a lot of pain if you buy a well-organized and easy-to-install version from www.redhat.com. A good example of how commercialization and banner ads have disfigured even the unlikeliest corners of the Web is www.linux.org.
Lisp is the most powerful and also easiest to use programming language ever developed. Invented by John McCarthy at MIT in the late 1950s, Lisp is today used by the most sophisticated programmers pushing the limits of computers in mathematical physics, computer-aided engineering, and computer-aided genetics. Lisp is also used by thousands of people who don't think of themselves as programmers at all, only people who want to define shortcuts in AutoCAD or the Emacs text editor. The best introduction to Lisp is also the best introduction to computer science: Structure and Interpretation of Computer Programs (Abelson and Sussman 1996; MIT Press).
Log Analyzer
A program that reads a Web server's log file (one line per request served) and produces a comprehensible report with summary statistics, e.g., "You served 234,812 requests yesterday to 2,039 different computers; the most popular file was /samoyed-faces.html".
Magic Cookie
The Magic Cookie protocol allows a Web service to conveniently maintain a "session" with a particular user. The Web server sends the client a "magic cookie" (piece of information) that the client is required to return on subsequent requests. The original specification is at http://home.netscape.com/newsref/std/cookie_spec.html.
Magnet Content
Material authored by a publisher in hopes of establishing an online community. In the long-run, a majority of the content in an successful community site will be user-authored.
Software sold to people who don't know how to program by people who know how to program. In theory, middleware sits between your relational database management system and your application program and makes the whole system run more reliably. MBAs are lining up right now to buy the Netscape Application Server middleware for $35,000/CPU. Without the benefit of middleware, I'm able to support a few hundred simultaneous users on a cheap desktop Unix box running AOLserver. The mainframe studs are usually able to get a few thousand transactions per second through computers such as the airline reservations systems. With Netscape Application Server and a $200,000 8-CPU Unix box, though, the testers at PC Week were able to support ... 10 simultaneous users. To achieve this performance, it was necessary to restart the servers constantly and reboot the Unix box occasionally (see http://www.photo.net/wtr/application-servers.html).
Multi-Purpose Internet Mail Extensions. Developed in 1991 by Nathan Borenstein of Bellcore so that people could include images and other non-plain-text documents in e-mail messages. MIME is a critical standard for the World Wide Web because an HTTP server answering a request always includes the MIME type of the document served. For example, if a browser requests "foobar.jpg", the server will return a MIME type of "image/jpeg". The Web browser will decide, based on this type, whether or not to attempt to render the document. A JPEG image can be rendered by all modern Web browsers. If, for example, a Web browser sees a MIME type of "application/x-pilot" (for the .prc files that PalmPilots employ) the browser will invite the user to save the document to disk or select an appropriate application to launch for this kind of document.
Operating System (OS)
A big complicated computer program that lets multiple simultaneously executing big complicated computer programs coexist peacefully on one physical computer. The operating system is also responsible for hiding the details of the computer hardware from the application programmers, e.g., letting a programmer say "I want to write ABC into a file named XYZ" without the programmer having to know how many disk drives the computer has or what company manufactured those drives. Examples of operating systems are Unix and Windows NT. Examples of things that try to be operating systems but mostly fail to fulfill the "coexist peacefully" condition are Windows and the Macintosh OS.
Oracle is the most popular relational database management system (RDBMS). It was developed by Larry Ellison's Oracle Corporation in the late 1970s. All of the example applications in this book were built using Oracle.
Perl is a scripting language developed by Larry Wall in 1986 to make his Unix sysadmin job a little easier. It unifies a bunch of capabilities from disparate older Unix tools. Like Unix, Perl is perhaps best described as "ugly but fast and useful". Perl is free, has particularly powerful string processing operators, and quickly developed a large following and therefore library for CGI scripting. For more info, see www.perl.com or www.perl.org.

Historical Note: Lisp programmers forced to look at Perl code would usually say "if there were any justice in this world, the guys who wrote this would go to jail." In a rare case of Lisp programmers getting their wish, in 1995 Intel Corporation persuaded local authorities to send Randal Schwartz, author of Learning Perl (O'Reilly 1997), to the Big House for 90 days (plus 5 years of probation, 480 hours of community service, and $68,000 of "restitution" to Intel). Sadly, however, it seems that Schwartz's official crime was not corrupting young minds with Perl syntax and semantics. At MIT, our Unix sysadmins periodically run a program called "crack" that tries to guess our passwords. When crack is successful, the sysadmins send us email saying "your password has been cracked; please change it to something harder to guess." Obviously they do not need our passwords since they have root access to all the boxes and can read any of our data. At MIT, you get paid about $50,000/year for doing this. In Oregon if you do this for a multi-billion dollar company that has recently donated $100,000 to the local law enforcement authorities, you've committed a crime. See http://www.lightlink.com/spacenka/fors/ for more on State of Oregon v. Randal Schwartz.

A Kodak standard for scanning and storing images. Every image on a Kodak PhotoCD is available in five resolutions, the highest of which is 2,000x3,000 pixels (Pro PhotoCD contains one extra resolution: 4,000x6,000 pixels). These disks are made from original slides or negatives by specially equipped labs, the best of which are linked to from http://www.photo.net/photo/labs.html.
Relational Database Management System. A computer program that lets you store, index, and retrieve tables of data. The simplest way to look at an RDBMS is as a spreadsheet that multiple users can update. The most important thing that an RDBMS does is provide transactions. See the chapter "Choosing a Relational Database".
Richard M. Stallman. In 1976, developed Emacs, the world's best and most widely used text editor. Went on to develop gcc, the most widely used compiler for the C programming language. Won a $240,000 MacArthur fellowship in 1990. Stallman is the founder of the free software movement (see www.fsf.org), and Project GNU, which gave rise to Linux.
In the technologically optimistic portion of the 20th century, robots were intelligent anthropomorphic machines that understood human speech, interpreted visual scenes, and manipulated objects in the real world. In the technologically realistic 21st century, robots are absurdly primitive programs that do things like "Go look up this book title at three different online bookstores and see who has the lowest price; fail completely if any one of the online bookstores has added a comma to their HTML page." Also known as intelligent agents (an intellectually vacuous term but useful for getting tenure if you're a university professor). Some simple but very useful examples of robots are the spiders or Web crawlers that fill the content database at public search engine sites such as AltaVista.
A marketing term used to sell defective software to executives at wealthy Web publishing companies. The Web is fundamentally about processing updates from thousands of concurrent users. This is what database management systems were built for. Smart engineers build Web services so that if the database is up and running, the Web site will be up and running. Period. Adding more users to the site will inevitably require adding capacity to the database management system, no matter what other software is employed. The thoughtful engineer will realize that a provably scalable site is one that relies on no other software besides the database management system (or that cheats with a thin layer of simple, reliable open-source software such as AOLserver or Apache).
Semantic Tag
The most popular Web markup language is HTML, which provides for formatting tags, e.g., "this is a headline" or "this should be rendered in italics." This is useful for humans reading Web pages. What would be more useful for computer programs trying to read Web pages is a semantic tag, e.g., "the following numbers represent the price of the product in dollars", or "the following characters represent the date this document was initially authored". XML and SGML are examples of systems that support communities of people who wish to exchange semantically tagged documents.
Standard Generalized Markup Language, standardized in 1980. A language for marking up documents so that they could be parsed by computer programs. Each community of people that wishes to author and parse documents must agree on a Document Type Definition (DTD), which it itself a machine-parsable description of what tags a marked-up document must or may have. HTML is an example of an SGML DTD. See Chapter 5.
A spider or Web crawler is a program that exhaustively surfs all the links from a page and returns them to another program for processing. For example, all of the Internet search engine sites rely on spider robots to discover new Web sites and add them to their index. Another typical use of a spider is by a publisher against his or her own site. The spider program makes sure that all of the links function correctly and reports dead links.
Structured Query Language. Developed by IBM in the mid-1970s as a way to get information into and out of relational database management systems. A fundamental difference between SQL and standard programming languages is that SQL is declarative. You specify what kind of data you want from the database; the RDBMS is responsible for figuring out how to retrieve it. I include a tutorial on SQL in the chapter "Database Management Systems".
Static Site
A static Web site comprises content that does not change depending on the identity of the user, the time of day, or what other users might have contributed recently. A static Web site is typically built using static documents in HTML format with graphics in GIF format and images in JPEG format. Collectively, these are referred to as static files. Contrast with a dynamic site, in which content can be automatically collected from users, personalized for the viewer, or changed as a function of the time of day.
Tool Command Language. An interpreted computer language designed for rapid prototyping and maximum flexibility, Tcl was developed in the late 1980s by John Ousterhout, a professor at UC Berkeley. Personally I'd much rather use Lisp, but AOLserver has a compiled-in Tcl interpreter. In the server programming chapter, I explain why I think Tcl has been effective for Web/db application development. Tcl is documented at www.scriptics.com and www.tclconsortium.org.
Transmission Control Protocol and Internet Protocol. These are the standards that govern transmission of data among computer systems. They are the foundation of the Internet. IP is a way of saying "send these next 1000 bits from Computer A to Computer B". TCP is a way of saying "send this stream of data reliably between Computer A and Computer B" (it is built on top of IP). TCP/IP is a beautiful engineering achievement, documented beautifully in TCP/IP Illustrated, Volume 1 (W. Richard Stevens 1994; Addison-Wesley).
A transaction is a set of operations for which it is important that all succeed or all fail. On an ecommerce site, when a customer confirms a purchase, you'd like to send an order to the shipping department and simultaneously bill the customer's credit card. If the credit card can't be billed, you want to make sure that the order doesn't get shipped. If the shipping database can't accept the order, you want to make sure that the credit card doesn't get billed. RDBMSes such as Oracle provide significant support for implementing transactions.
An operating system developed by Ken Thompson and Dennis Ritchie at Bell Laboratories in 1969, vaguely inspired by the advanced MULTICS system built by MIT. Unix really took off after 1979, when Bill Joy at UC Berkeley released a version for Digital's VAX minicomputer. All the competent computer hackers used to hate Unix but through some combination of Unix being enhanced and the rest of the world slipping into darkness (Windows, Mac OS) the Unix haters have all lumped it and brought Linux boxes into their homes.
Uniform Resource Locator. A way of specifying the location of something on the Internet, e.g., "http://www.photo.net/wtr/thebook/glossary.html" is the URL for this glossary. The part before the colon specifies the protocol (HTTP). Legal alternatives include encrypted protocols such as HTTPS and legacy protocols such as FTP, news, gopher, etc. The part after the "//" is the server hostname ("photo.net"). The part after the next "/" is the name of the file on the remote server.
A threaded discussion system that today connects millions of users from around the Internet into newsgroups such as rec.photo.equipment.35mm. The original system was built in the late 1970s and ran on one of the wide-area computer networks later subsumed into the Internet.
A pioneering Internet appliance, based on the premise that a consumer would be delighted to enjoy email and Web browsing without having to suffer with the complexity and system administration overhead of running a Microsoft operating system. WebTV also provided an illustration of the staying power of unregulated monopolies as Microsoft used its supranormal profits from desktop applications to acquire WebTV along with Hotmail, two of the best examples of a future beyond Windows.
Windows NT
A real operating system that can run the same programs with more or less the same user interface as the popular Windows 95/98 system. Windows NT was developed from scratch by a programming team at Microsoft that was mostly untainted by the people who brought misery to the world in the form of Windows 3.1/95. The final system works surpringly well, though not as reliably as moldy old Unix. See the server chapter for a comparison between these two warhorses.
What You See Is What You Get. A WYSIWYG word processor, for example, lets a user work view an on-screen document as it will appear on the printed page, e.g., with text in italics appearing on-screen in italics. This approach to software was pioneered by Xerox Palo Alto Research Center in the 1970s and widely copied since then, notably by the Apple Macintosh. WYSIWYG is extremely effective for structurally simple documents that are printed once and never worked on again. WYSIWYG is extremely ineffective for the production of complex documents and documents that must be maintained and kept up-to-date over many years. Thus Quark Xpress and Adobe Frame facilitated a tremendous boom in desktop publishing while Microsoft FrontPage and similar WYSIWYG tools for Web page developed have probably hindered development of interesting Web services.
Extensible Markup Language, a simplifed version of SGML with enhanced features for defining hyperlinks. As with SGML, it solves the trivial problem of defining a syntax for exchanging structured information but doesn't do any of the hard work of getting users to agree on semantic structure.

Add a comment | Add a link