Pig racing at the New Jersey State Fair 1995.  Flemington, New Jersey.

The Web Tools Review on Servers

mostly by Philip Greenspun for the Web Tools Review (and as part of a real dead trees book that you can buy right now from amazon.com)

Playing blackjack in Atlantic City (New Jersey)

There are three levels at which you can take responsibility for your Web site:

If your Web site is simply on a remote machine that someone else administers, then your only responsibility is periodically transferring your static files there, and perhaps some CGI scripts. As soon as the remote server is behaving the way you want, you can walk away until it is time to update your site. You can go away on vacation for two months at a stretch. If you need expensive software, such as a relational database, you can simply shop for a site-hosting service that includes use of that software as part of a package.

The downside of using someone else's Web server is that you are entirely at the mercy of the system administrators of the remote machine. If e-mail for your domain isn't being forwarded, you can't get your own consultant or go digging around yourself-you have to wait for people who might not have the same priorities that you do. If you are building a sophisticated relational database-backed site, you might not have convenient access to your data. Competent providers will usually manage a domain-level site for $100-200 per month.

If you are the owner of a Web-serving computer inside someone else's network then you have total freedom to make changes to your configuration and software. You'll have root password and shell access and therefore can use the machine for software development or other experiments. Whoever is hosting your box is responsible for network connectivity. If packets aren't getting to your machine, they'll probably notice and do something about it. You don't have to pay the $2,500 per month cost of a T1 line yourself.

The downside to running your own box is that you have to carefully watch over your computer and Web server program. Nobody else will care about whether your computer is serving pages. You'll have to carry a pager and sign up for a service like Uptime that will email you and beep you when your server is unreachable. You won't be able to go on vacation unless you find someone who understands your computer and the Web server configuration to watch it. Unless you are using free software, you may have to pay shockingly high licensing fees. Internet service providers charge between $250 a month and $2,000 a month for physical hosting, depending on who supplies the hardware, how much system administration the ISP performs, and how much bandwidth your site consumes.

If you are the owner of a machine inside your own network then you can sit right down at the console of your Web server and do software development or poke through the database. This can be a substantial convenience if you are running an RDBMS-backed Web site and there is no obvious way to have development and production sites. If you are already paying to have a high-speed network connection to your desktop then the marginal cost of hosting a server this way may be zero (a fact that was not lost on university students throughout the early years of the Web). The downside is that you have all of the responsibilities and hassles of owning a server physically inside someone else's network, plus all of the responsibilities and hassles of keeping the network up.

My Personal Choice

Muscle Beach.  Venice Beach, California. Which hosting option did I choose for my personal Web site? The last one, of course. At MIT we don't let random commercial losers host our Web sites. We've had a hardwired network since the 1960s, so of course we always do things at the highest level of professionalism. In fact, during the "papal visit" (when Bill Gates came to speak), the director of our lab took particular care to note that MIT was "the home of the Web."

He probably hadn't read this problem report that I had sent to some folks with whom I share a Web server:

1) Saturday, 6:30 am: we had a 2-second power glitch
2) we do not have an uninterruptible power supply for Martigny
   [HP Unix file server for a cluster of user machines] so it crashed
3) we do not have an uninterruptible power supply for Swissnet
   [swissnet.ai.mit.edu, our Web server, an antique HP Unix
   workstation] so it crashed
4) there is something we never figured out about Swissnet so
   that it doesn't boot properly after a power interruption
5) Saturday, 4 pm: I went down to the 3rd floor and manually
   instructed Swissnet to boot from its root disk, thus ending
   almost 10 hours of off-the-Web time
6) Somewhere along the line, Tobler [one of the user machines
   managed from Martigny] tried to reboot.  Because it couldn't
   get to Martigny, it booted from a locally attached disk. 
   This disk was the old Swissnet root disk [we'd hooked it up
   to Tobler after upgrading from HP-UX 9.x to 10.10 because we
   thought we might need some files from it].  Tobler consequently
   advertised itself as "18.23.0.16" [Swissnet's IP address].
7) Saturday, 10:30 pm: Radole [the main router for the MIT
   Laboratory for Computer Science] saw that there were two
   computers advertising themselves as 18.23.0.16 and apparently
   decided to stop routing to the physical Swissnet on the 3rd floor
8) Sunday, 4 pm:  I arrive at work to a phone message from Brian: 
   "Swissnet's routing is hosed".  I reboot the machine.  No
   improvement.  I page George Rabatin [LCS network administrator
   and the Radole guru].
9) Sunday, 5 pm:  George figures out that the problem is
   Tobler's false advertising.  We turn Tobler off.
10) Sunday, 9 pm:  George has manually purged all the caches on
    Radole and I've rebooted Swissnet but still no routing.
11) Sunday, 11 pm:  George Rabatin declares a "network emergency"
    with the main MIT Net administrators so that they can probe
    the Building 24 FDDI router [FDDI is the 100 Mbit/second token
    ring that serves the entire MIT campus.]
12) Sunday, midnight: One of the MIT guys manually flushed the
    ARP cache on the FDDI router and Swissnet instantly came
    back into existence.  Given that Tobler wasn't on the same
    subnet and that Radole supposedly stopped doing proxy ARP
    around seven months ago, it is a mystery to me how this
    router could have had an ARP entry (mapping IP address to
    physical Ethernet hardware address) for Faux Swissnet. 
    But it apparently did.  So we're back on the Web.
Good news:  We saved $500 by not buying two uninterruptible power
supplies.  We found out that George Rabatin is a hero.
Bad news: We probably denied services to about 5000 users over 34
hours.  We burned up about 20 person-hours of various folks' time
on a Sunday trying to fix a problem that we created.

That's how professionals do things . . .

[Note: Partly because of incidents like this, I decided to park my personal database server at above.net. Five days after I flew this machine out to California (using an airline's counter-to-counter package server)...

From: William Wohlfarth <wpwohlfa@PLANT.MIT.EDU>
Subject: Power Outage August 7 , 1997
Date: Fri, 8 Aug 1997 07:39:43 -0400

At approximately 5:35pm on August 7, 1997 a manhole explosion in the
Kendall Sq area caused Cambridge Electric to lose Kendall Station. MIT
lost all power and the gas turbine tripped. Power was fully restored at
7pm.

At approximately 7:05pm, a second manhole explosion,caused 1 fatality,
and injuries to 4 other Cambridge Electric utilitymen including a
Cambridge Policeman. Putnam Station was also tripped and MIT lost all
power again.

At approximately 10:30pm, MIT had power restored to all buildings within
our distribution system. Several East campus ( E28, E32, E42, E56, E60,
NE43)and North/Northwest buildings( N42, N51/52, N57, NW10, NW12, NW14,
NW15, NW17, NW20, NW21, NW22, NW30, NW61, NW62, W11, WW15) which are fed
directly from Cambridge Electric, were restored by Cambridge Electric
personal.

Cambridge Electric is still sorting out the chain of events. At last
discussions with them, a total of 3 manhole explosions had taken place.
Additional information will be posted when available.
The battery farm and gas-fired backup generator at above.net no longer seemed like paranoia.]

Choosing a Computer

Computers are the tools of the devil. It is as simple as that. There is no monotheism strong enough that it cannot be shaken by Unix or any Microsoft product. The devil is real. He lives inside C programs.

Hardware engineers have done such a brilliant job over the last 40 years that nobody notices that, in the world of commercial software, the clocks all stopped in 1957. Society can build a processor for $50 capable of executing 200 million instructions per second. Marvelous. With computers this powerful, the amazing thing is that anyone still has to go into work at all. Perhaps part of the explanation for this apparent contradiction is that, during its short life, the $50 chip will consume $10,000 of system administration time.

Everything that I've learned about computers at MIT I have boiled down into three principles:

In theory, a Macintosh or Windows 95 machine could function as a low-volume Web server for static files. However, since those operating systems lack multiprocessing and memory protection, you'd have to dedicate an entire machine to this task. Finally, you'd never be able to do anything interesting with a Mac or Win95-hosted site because relational database management systems such as Oracle require a multi-tasking operating system.

Note: The NeXT operating system is based on Mach, a reimplementation of Unix created in the mid-1980s at Carnegie-Mellon University. So if (when?) Apple ships a Macintosh running the NeXT operating system, it will basically be a Unix box (see below) for the purposes of Web service.

Most people buying a server computer make a choice between Unix and Windows NT. These operating systems offer important 1960s innovations like multiprocessing and protection among processes. Certainly the first thing to do is figure out which operating system supports the Web server and database management software that you want to use (see my book for more on that topic). If the software that appeals to you runs on both operating systems, then make your selection based on which computer you, your friends, and your coworkers know how to administer. If that doesn't result in a conclusion, then read the rest of this section.

Unix

Buying a Unix machine guarantees you a descent into Hell. It starts when you plug the computer in and it won't boot. Yes, they really did sell you a $10,000 computer with an unformatted disk drive. There is only one operating system in the world that will run on your new computer, but the vendor didn't bother to install it. That's how you are going to spend your next couple of nights. You'll be asked dozens of questions about disk partitioning and file system journaling that you couldn't possibly answer. Don't worry, though, because Unix vendors have huge documentation departments to help you. Unfortunately, your computer shipped without any documentation. And, although the marketing department has been talking about how this vendor is God's gift to the Internet, the rest of the company still hasn't jacked into this World Wide Cybernet thing. So you won't find the documentation on the Web.

So you decide to save some trees and order a documentation CD-ROM. You plug it into your nearest Macintosh or PC and . . . nothing happens. That's right, the documentation CD-ROM isn't usable unless you have a completely working Unix computer made by the same company.

A week later, you've gotten the machine to boot and you call over to your Web developer: "Set up the Web server." But it turns out that he can't use the machine. Everything in Unix is configured by editing obscure incantations in text files. Virtually all competent Unix users edit text in a program called Emacs, probably the best text editor ever built. It is so good that the author, Richard Stallman, won a MacArthur genius fellowship. It is also free. But that doesn't mean that it meets the standards of Unix vendors. No, the week-long installation process has left you only with VI, an editor that hardly anyone worth hiring knows how to use.

So you download the Emacs source code over the Internet and try to compile it. Good luck. Your computer didn't come with a compiler. The most popular C compiler for Unix is GCC, another free program from Richard Stallman. But it would have been too much trouble for the vendor to burn that onto their software CD-ROM, so you don't have it.

At this point you are in serious enough trouble that you have to hire a $175-per-hour consultant just to make your computer function. Two days and $4,000 later, your computer is finally set up the way a naïve person would assume that it would have shipped from the factory.

That's what setting up a Unix box is like. If it sounds horribly painful, rest assured that it is. The reason that anyone buys these computers is that usually they are administered in clusters of 100 machines. The time to administer 1,000 Unix boxes is about the same as the time to administer one and therefore the administration cost per machine isn't ruinous. This will be cold comfort to you if you only have the one Web server, though.

There is an upside to all of this. The operating system configuration resides in hundreds of strangely formatted text files. During the week you spent setting up Unix, you cursed this feature. But once your system is working, it will continue working forever. As long as you don't go into Emacs and edit any of those configuration files, there is no reason to believe that your Unix server won't function correctly. It isn't like the Macintosh or Windows worlds where things get silently corrupted and the computer stops working.

Which Brand of Unix Box?

Hewlett-Packard makes the fastest and most reliable Unix computers. You would think that Unix would be impossible to support because different sites have completely different configurations. Nonetheless, I've found that the HP support people can usually telnet into my machines and fix problems over the network themselves. If you call them at 1 a.m., you'll be working with an engineer in Australia. If you call at 4 a.m., you'll be working with their staff in England.

Silicon Graphics seems to be a popular choice among my friends who run huge multiprocessor servers. Digital ALPHA servers are very popular with those who have big relational database management systems.

The main problem with all of these kinds of Unix is that the latest and greatest Web server software either won't be available for your computer or it won't really have been tested. Unix is not a standard. A program that works on HP's Unix will not work on Silicon Graphics's Unix. If you want to pull programs off the Net and have them just work, the best kind of Unix to have is SPARC/Solaris from Sun (when I went out to spend my own cash on a Web server, I bought an UltraSPARC).

Truly sophisticated Unix people seem to run Unix on standard PC hardware. The most popular Unix for PCs is Linux, an entirely free operating system. You can download it off the Net for nothing. Or you can pay $50 to a company like Red Hat (www.redhat.com) for a CD-ROM. After making a few mouse clicks in the installer, your PC will be running

and all of this out of several CPUs simultaneously if you recompile the kernel to do symmetric multiprocessing.

Running a free Unix on a PC entails a different philosophy from buying hardware and software from the same company. You are abandoning the fantasy that there is a company out there who will support you if only you give them enough money. You or someone you hire will take a little more responsibility for fixing bugs. You have the source code, after all. If you want support, you have to make an intelligent decision about who can best provide it.

PC hardware is so much cheaper than workstation hardware that for the price of one regular Unix workstation you will probably be able to buy two complete PC systems, one of which you can use as a hot backup. Keep in mind that 99 percent of PC hardware is garbage. A friend of mine is a small-time Internet service provider. He runs BSDI on a bunch of PC clones. A hard disk was generating errors. He reloaded from backup tape. He still got errors. It turned out that his SCSI controller had gone bad some weeks before. It had corrupted both the hard disk and the backup tapes. He lost all of his data. He lost all of his clients' data.

What stops me from running Linux is not fear of unreliable hardware or divided responsibility for software and hardware bugs. It is that big software companies don't trust this model or market. You can't get Adobe PhotoShop for Linux. You can't buy the Netscape Web servers for Linux. Free software often works better than commercial solutions, but not always. I have more software options with my HP-UX and SPARC/Solaris machines even if they are much harder to set up.

I'll leave you with an anecdote about my desktop Hewlett-Packard 715/80 Unix workstation. Let's see how long it has been up:

orchid.lcs.mit.edu 33: uptime
 11:30pm  up 53 days, 10:28,  5 users,  load average: 0.10, 0.07, 0.06

Yes, that's 53 days, 10 hours, 28 minutes. It is running a relational database management system. It is serving ten hits per second to the Web from an AOLserver process listening on four different IP addresses. It is running the X Window System so that I can use it from my home computer just as easily as if I were on campus at MIT. Yet though this is an old computer, not nearly as powerful as a Pentium Pro that a child might find under a Christmas tree, it is doing all of this while working only about one-tenth of the time. That's what a load average of 0.10 means.

Something Nice About Solaris and Sun

I used to make a specialty of beating up Sun and Solaris. Usually I'd write something nasty for Web Tools Review at 3 a.m. after being tortured by flakey hardware and obscure Unix problems. Most of that stuff dates from 1995 and I didn't get around to removing it. Since then, though, I've served about two million RDBMS-backed hits/day off various Solaris machines with very little sturm und drang. I would type "mpstat 5" and find that the load of a Web server program and RDBMS was being distributed almost perfectly evenly among however many CPUs the computer had.

Another reason that I think I like Solaris better nowadays is that I'm no longer playing Caveman Unix System Administrator. I've teamed up with Moses Merchant at Hearst and Jin Choi and Cotton Seed at ArsDigita. I've also personally purchased about $20,000 of Sun hardware and plugged it together. I've also done things like order backup tape from Sun Express (800-873-7869). They had it at a lower price than "back of the magazine sleaze shop" vendors. They shipped it overnight. They didn't charge me for shipping.

My best Sun story is that I ordered some disk drives from them for MIT. They came in 8 boxes via FEDEX and were signed for by an MIT employee. By the time I found the boxes outside my office, two were missing. I'm pretty sure that they were stolen and certainly Sun was off the hook. FEDEX said that Sun shipped 8 boxes and 8 boxes were signed for by MIT. The MIT receiving folks couldn't remember whether there had been six boxes or the full eight as per the airbill. I sent Sun email asking them to double-check how many disks had been shipped and whether in fact drives with those serial numbers had left their inventory. I did not ask them to replace the drives because I was pretty sure that it wasn't their fault. They replaced the drives without my asking! I just arrived at MIT one day to find two more boxes from Sun. What makes all of this more remarkable is that I'd bought everything at the MIT price which is lower than the "back of the magazine sleaze shop" price. The disks are manufactured by Seagate and I don't think Sun makes any profit by reselling them at their university prices.

Anyway, I always liked Solaris because it is the operating system for which most Web server and RDBMS software is released first. When the UltraSPARC came out, I started to like the performance. Sometime in 1997, I started to like the company. Right now, I think a genuine Sun-brand computer running Solaris is the best choice for a Web publisher trying to do something ambitious.

[Of course, as soon as I finished writing the above, the Sun Ultra 2 on which I was depending racked up 10 catastrophic hardware failures in 6 months. That was more hardware failures than we see on clusters of 25 HP Unix workstations in 5 years. Each hardware failure resulted in a clueless board-swapper being dispatched from the phone company (Sun contracts out most of its hardware service). "Iz video card," said one of these guys in a heavy Russian accent. "I have seen ziz before." (It was the power supply.)

A typical experience was Friday, September 5, 1997. A Sun service guy came out to swap the main system board at 3 pm. The swap went so smoothly that my partner Cotton said "I can't believe it. I bet I go home tonight and find that my house has been burned down." At 6 am, the above.net monitors began sending us email complaining that they couldn't ping our machine. We were in bed so we didn't answer the email and the above.net guys decided to telephone us. It was a Saturday, outside of Sun's normal service hours unless you have Platinum support. We called Sun and agreed to pay a $300 uplift fee (for same-day service) plus $275/hour (2 hours minimum). They spent the next 12 hours telephoning us to ask for the physical location of the machine (a machine that they'd serviced less than 24 hours previously), asking for our credit card number three times, and promising to show up Real Soon Now (TM). Eventually they came back and swapped the system board again and we were on the Internet. Total outage: 13 hours.

In the end, the bottom line on Sun is that between the money we've had to refund to our hosting customer and the money we're going to be billed just for this one board swap, we could have purchased an HP Unix workstation.]

Windows NT

When I was a kid I didn't like the taste of Coca-Cola. After I'd been exposed to 100,000 TV commercials and billboards for Coke, I decided that it was the best drink in the world. Just opening a can made me feel young, good-looking, athletic, surrounded by gorgeous blondes.

Windows NT is sort of like that. At first glance it looks like a copy of Unix (which in turn was a copy of operating systems from the 1960s) with a copy of the Macintosh user interface (which in turn was based on systems developed at Xerox PARC in the 1970s). I didn't think much more about it.

Eventually the Microsoft PR mill convinced me that Windows NT was the greatest computer innovation ever. Bill Gates had not only invented window systems and easy to use computers, but also multitasking, protection among processes, and networking. It would be like Unix without the obscurity.

I told all of my friends how they were losers for running Unix. They should switch to NT. It was the future. That was more or less my constant refrain until one pivotal event changed my life: I actually tried to use NT.

Having once watched three MIT wizards each spend ten hours installing a sound card in a PC, I was in no mood to play with clones. I got myself an Intel Inside and Intel Outside genuine Intel-brand PC. I reformatted the hard drive with NT File System (NTFS) and installed WinNT Server. The machine booted smoothly but running any program triggered Macintosh emulation mode: You move the mouse but nothing happens on the screen.

I spent two weeks trying to figure out why the user interface was crashing, reinstalling NT several times. I enlisted the help of a professional NT administrator. He tried eight different combinations of file systems but none of them worked.

What I learned: Do not buy a computer that isn't "NT certified." In fact, don't buy one unless the vendor has already installed the version of NT that you intend to use. Personally I'd buy a Hewlett-Packard system.

I started over with a Pentium Pro 200 and NT 4.0 Server. The operating system installed flawlessly, but my MIT undergrad PC wizard never could get the machine to execute a CGI script from either the Netscape FastTrack Server or the included Microsoft Web server. It took several weeks and three MIT wizards to get the machine to talk to the HP Laserjet down the hall.

NT Success Story 1: I downloaded some fax software from Microsoft. It was never able to talk to my modem or send a fax. But it did consume 20 percent of the machine's CPU and grew to 30MB in size. The standard system monitoring tools that come with NT make it almost impossible to figure out what is killing one's machine.

NT Success Story 2: I bought an HP Laserjet 5M printer for my house that I might be the first one on my block with a duplexing printer. It took me 5 seconds to get my Macintosh to print to it. It took me 5 minutes to get all of the Unix boxes at MIT to print to it. It took me 5 hours to get my Windows NT computer to recognize this printer, even though both were on the same Ethernet wire. As part of my 5-hour saga, I had to download a 4MB program from the HP Web site. Four megabytes. That's larger than any operating system for any computer sold in the 1970s. Mainframes in the 1970s could run entire airlines with less than 4MB of code.

Unix versus NT

Table 6.1 is a summary chart of the differences between Unix and NT Web servers.

Unix

NT

Easy to maintain remotely

Yes

No

Consultants

Cheap and smart

Expensive and stupid

Price of software

Free or expensive

Cheap

Reliability

High

Medium

Support

Depends on vendor; sometimes excellent

Microsoft

Price of hardware

cheap with Linuxes; expensive with other Unices

Cheap

Atlantic City (New Jersey)

Most of the people I know who are facile with both NT and Unix have eventually taken down their NT Web servers and gone back to Unix. The WebCrawler's comprehensive statistics, gathered as it indexes the Web, confirm my anecdotal evidence: As of January 1997, Unix sits behind 84 percent of world's Web sites; NT sits behind 7 percent.

It turns out that, once you get to a certain level of traffic, you want your Web server in a closet right up against the routers that carry bits out of your building. You might think that the user interface of Unix sucks. But, thanks to X, it doesn't get any worse if you stay in your comfortable office or cozy house and drive your Web server remotely. Any program that you can run on the console, you can run from halfway around the world. Most sysadmins don't even go up to the physical machine to reboot their Unix boxes.

Unless you are a lot smarter than anyone I know, you will need consultants. You're buying into a user community when you buy into an operating system. A big part of the Unix user community consists of the smartest and poorest paid people in the world: science and engineering graduate students. Moreover, these people are used to helping each other over the Net, usually for no money. When I'm running a Unix program at a commercial site and want an enhancement, I send the author e-mail asking if he'll make the changes for $200 or so. Since most such requests come from users at universities who can't offer any money, this kind of proposal is invariably greeted with delight. When I'm confronted with a useless Unix box that doesn't have Emacs on it, I get the client to hire a friend of mine in Texas to install it. He telnets in and lets the compiler run while he's answering his e-mail in another window.

By contrast, anyone who has learned to install Microsoft Word on a Windows NT machine is suddenly a $150 an hour consultant. Unless you count nerdy high school kids, there is no pool of cheap expertise for NT. And because NT boxes are tough to drive remotely, a wizard at another location can't help you out without disturbing his daily routine.

There is no technical reason why it couldn't have been the other way around, but it isn't. A true Windows NT wizard is making $175,000 a year maintaining a financial firm's servers; he isn't going to want to bother with your Web server. Atlantic City (New Jersey)

Software licensing can be much more expensive with Unix. True, much of the best and most critical software is free. But many software firms have figured out that if you were stupid enough to pay $10,000 for a Sun SPARCstation 5 that is slower than your next-door neighbor's 7-year-old's Pentium 166 then you are probably stupid enough to pay three times as much for the same software. If you intend to purchase a lot of commercial software for your Web server, it is probably worth checking vendor price lists first to make sure that you couldn't pay for the entire NT machine with the Unix/NT license fee spread.

Note: Web server software and relational database management systems seem to be two areas where NT and Unix pricing are often the same.

Unix wizards love to tell horror stories about Unix in general and Solaris in particular. That's usually because the best of them were accustomed to the superior operating systems of the 1960s and 70s that Unix replaced. But the fact remains that Windows NT is less reliable than Unix and has more memory leaks. In the Microsoft culture it is amazing when a computer stays up and running for more than one day so nobody complains if it takes them two months to make Oracle work or if the NT server has to be rebooted once a week.

Support can be much better with Unix. The whole idea of the Apple and Microsoft support 800 number doesn't make any sense in an Internet age. Why are you talking into a telephone telling someone what text is appearing on your screen? Your computers are both on the Internet and capable of exchanging data at perhaps 500,000 bps. I'm not so sure about the other Unix vendors but I know from personal experience that Hewlett-Packard has figured this out. Plus you actually get better support when you dial in at 4 a.m. because the kind of people willing to take a tech support job in England are much more able than the kind of people willing to take a tech support job in California. Keep in mind that support does not have to be much better with Unix. I've personally never gotten any useful assistance from the official Sun support apparatus.

That's about as much as I can say. I don't think that there is a universal truth for making the NT/Unix choice other than my original one: Computers are tools of the Devil. I learned that from a tenured professor in computer science at MIT. I think he is still trying to get his Macintosh to stop crashing.

Final Hardware Selection Note

Whatever server computer you buy, make sure that you get an uninterruptible power supply and mirrored disks. You should not go offline because of a power glitch. You should not go offline because of a disk failure. If you do decide to take the Unix plunge, lay in a big stock of books from O'Reilly (http://www.ora.com).

Server Software

Once you have bought a Web server computer, you need to pick a Web server program. The server program listens for network connections and then delivers files in response to users' requests. These are the most important factors in choosing a program:

Each of these factors needs to be elaborated.

API

Unless your publishing ambition is limited to serving static files, you will eventually need to write some programs for your Web site. It is quite possible that you'll need to have a little bit of custom software executing every time a user requests any file. Any Web server program can invoke a common-gateway interface (CGI) script. However, CGI scripts impose a tremendous load on the server computer. Furthermore, an all-CGI site is less straightforward for authors to maintain and for search engines to search than a collection of HTML files.

A Web server API makes it possible for you to customize the behavior of a Web server program without having to write a Web server program from scratch. In the early days of the Web, all the server programs were free. You would get the source code. If you wanted the program to work differently, you'd edit the source code and recompile the server. Assuming you were adept at reading other peoples' source code, this worked great until the next version of the server came along. Suppose the authors of NCSA HTTPD 1.4 decided to organize the program differently than the authors of NCSA HTTPD 1.3. If you wanted to take advantage of the features of the new version, you'd have to find a way to edit the source code of the new version to add your customizations.

An API is an abstraction barrier between your code and the core Web server program. The authors of the Web server program are saying, "Here are a bunch of hooks into our code. We guarantee and document that they will work a certain way. We reserve the right to change the core program but we will endeavor to preserve the behavior of the API call. If we can't, then we'll tell you in the release notes that we broke an old API call."

An API is especially critical for commercial Web server programs where they don't release the source code at all. Here are some typical API calls from the AOLserver documentation (http://www.aolserver.com):

ns_user exists user returns 1 (one) if the specified user exists and 0 (zero) if it does not.
ns_sendmail to from subject body sends a mail message

The authors of AOLserver aren't going to give you their source code and they aren't going to tell you how they implement the user/password database for URL access control. But they give you a bunch of functions like "ns_user exists" that let you query the database. If they redo the implementation of the user/password database in the next release of the software then they will redo their implementation of "ns_user" so that you won't have to change your code. The "ns_sendmail" API call not only shields you from changes by AOLserver programmers, it also allows you to not think about how sending e-mail works on various computers. Whether you are running AOLserver on Windows NT, HP Unix, or Linux, your extensions will send e-mail after a user submits a form or requests a particular page.

Aside from having a rich set of functions, a good API has a rapid development environment and a safe language. The most common API is for the C programming language. Unfortunately, C is probably the least suitable tool for Web development. Web sites are by their very nature experimental and must evolve. C programs like Microsoft Word remain unreliable despite hundreds of man-years of development and thousands of man-years of testing. A small error in a C subroutine that you might write to serve a single Web page could corrupt memory critical to the operation of the entire Web server and crash all of your site's Web services. On operating systems without interprocess protection, such as Windows 95 or the Macintosh, the same error could crash the entire computer.

Even if you were some kind of circus freak programmer and were able to consistently write bug-free code, C would still be the wrong language because it has to be compiled. Making a small change in a Web page might involve dragging out the C compiler and then restarting the Web server program so that it would load the newly compiled version of your API extension.

By the time a Web server gets to version 2.0 or 3.0, the authors have usually figured that C doesn't make sense and have compiled in an interpreter for Tcl, Java byte codes, or JavaScript.

RDBMS Connectivity

You've chosen to publish on the Web because you want to support collaboration among users and customize content based on each individual user's preferences and history. You see your Web site as a lattice of dazzling little rubies of information. The Unix or Windows NT file system, though, only understands burlap sacks full of sod. As you'll find out when you read my chapters on building database-backed Web sites, there aren't too many interesting things that you can implement competently on top of a standard file system. Sooner or later you'll break down and install a relational database management system (RDBMS).

You'll want a Web server that can talk to this RDBMS. All Web servers can invoke CGI scripts that in turn can open connections to an RDBMS, execute a query, and return the results formatted as an HTML page. However, some Web servers offer built-in RDBMS connectivity. The same project can be accomplished with much cleaner and simpler programs and with a tenth of the server resources.

Support and Source Code Availability

Most computer programs that you can buy in the 1990s are copies of systems developed in the 1960s. Consider the development of a WYSIWYG word processor. A designer could sit down in 1985 and look at ten existing what-you-see-is-what-you-get word processors: Xerox PARC experiments from 1975, Mac Write, workstation-based systems for documentation professionals (such as Interleaf). He would not only have access to the running programs but also to user feedback. By 1986 the designer hands off the list of required features to some programmers. By 1987, the new word processor ships. If enough of the users demand more sophisticated features, the designers and programmers can go back to Interleaf or Frame and see how those features were implemented. Support consists of users saying "it crashes when I do x," and the vendor writing this information down and replying "then don't do x." By 1989, the next release of the word processor is ready. The "new" features lifted from Interleaf are in place and "doing x" no longer crashes the program.

Does this same development cycle work well for Web server programs? Although the basic activity of transporting bits around the Internet has been going on for three decades, there was no Web at Xerox PARC in 1975. There is no one designer who can anticipate even a fraction of user needs. Web publishers cannot wait years for new features or bug fixes.

An important feature for a Web server is source code availability. If worst comes to worst, you can always get a wizard programmer to extend the server or fix a bug. Vendor indifference cannot shut down your Web site. That doesn't mean you should ignore commercial servers that are only available as binaries. They may offer features that let you build a sophisticated site in a fraction of the time it would take with a more basic public-domain server.

If you can't get source code then you must carefully consider the quality of the support. What is the culture of the vendor like? Do they think, "We know a lot more than our users and every couple of years we'll hand them our latest brilliant innovation" or "We have a lot to learn from our users and will humbly work to meet their needs"? A good vendor knows that even a whole company full of Web wizards can't come up with all the good ideas. They expect to get most of their good ideas from working with ambitious customers. They expect to deliver patched binaries to customers who find bugs. They expect to make a customer problem their own and keep working until the customer is online with his publishing idea.

Availability of Shrink-wrap Plug-ins

Are your ideas banal? Is your Web site like everyone else's? If so, you're a good candidate for shrink-wrapped software. In a field changing as rapidly as Web publishing, packaged software usually doesn't make anyone's life easier. Sometimes a $500 program is helpful but the grand $50,000 package ends up being a straitjacket because the authors didn't anticipate the sorts of sites that you'd want to build.

Still, as the Web matures, enough commonality among Web sites will be discovered by software vendors to make shrink-wrapped software useful. An example of a common need is "I just got a credit card from a consumer and I want to bill it before returning a confirmation page to him." Often these packages can be implemented as CGI scripts suitable for use with any Web server. Sometimes, however, it is necessary to add software to the API of your Web server. If you are using a Web server that is popular among people publishing similar sites then you are more likely to be able to buy shrink-wrapped software that fits into your API.

Speed

Pig racing at the New Jersey State Fair 1995.  Flemington, New Jersey.

It is so easy now to get a high-efficiency server program that I initially thought this point wasn't worth mentioning. In ancient times, the Web server forked a new process every time a user requested a page, graphic, or other file. The second generation of Web servers pre-forked a big pool of processes, e.g., 64 of them, and let each one handle a user. The server computer's operating system ensured that each process got a fair share of the computer's resources. A computer running a pre-forking server could handle at least three times the load. The latest generation of Web server programs use a single process with internal threads. This resulted in another tripling of performance.

It is possible to throw away 90 percent of your computer's resources by choosing the wrong Web server program. Traffic is so low at most sites and computer hardware so cheap that this doesn't become a problem until the big day when the site gets listed on the Netscape site's What's New page. In the summer of 1996 that was good for several extra users every second at the Bill Gates Personal Wealth Clock (http://www.webho.com/WealthClock). I'm glad now that I had been thinking about efficiency in the back of my mind.

Given these criteria, let's evaluate some of the more intelligent choices in Web server programs.

AOLserver (a.k.a. GNNserver, NaviServer)

America Online doesn't run Internet Protocol among their millions of subscribers. They have a strictly 1960s-style time-sharing model with their own proprietary protocols and software. Yet AOL decided to keep one corporate foot in the 1980s by buying up the best Internet technology companies it could find. One of them was NaviSoft, a Santa Barbara company that made by far the most interesting Web server of 1995: NaviServer. Despite having been subjected to a humiliating series of name changes, AOLserver remains a strong product.

AOLserver has a rich and comprehensive set of API calls. Some of the more interesting ones are the following:

These kinds of API calls let you write sophisticated Web/e-mail/database systems that are completely portable among different versions of Unix and Windows NT.

These are accessible from C and, more interestingly, from the Tcl interpreter that they've compiled into the server. I have written thousands of Tcl procedures to extend the AOLserver and have never managed to crash the server from Tcl. There are several ways of developing Tcl software for the AOLserver but the one with the quickest development cycle is to use *.tcl URLs.

A file with a .tcl extension anywhere among the .html pages will be sourced by the Tcl interpreter. So you have URLs like "/bboard/fetch-msg.tcl". If asking for the page results in an error, you know exactly where in the Unix file system to look for the program. After editing the program and saving it in the file system, the next time a browser asks for "/bboard/fetch-msg.tcl" the new version is sourced. You get all of the software maintenance advantages of interpreted CGI scripts without the CGI overhead.

The next version of AOLserver (2.2; due June 1997) will include a Java API as well.

Though AOLserver shines in the API department, its longest suit is its RDBMS connectivity. The server can hold open pools of connections to multiple relational database management systems. Your C or Tcl API program can ask for an already-open connection to an RDBMS. If none is available, the thread will wait until one is, then the AOLserver hands your program the requested connection into which you can pipe SQL. This architecture improves Web/RDBMS throughput by at least a factor of ten over the standard CGI approach.

Support and source code availability are weak points for AOLserver. Though free, AOLserver is a commercial product and AOL won't give out the source code. The documentation and the API are superb so you really shouldn't ever need the source code if everything works as advertised. However, AOLserver is a C program, and, though the developers of AOLserver are probably the best C programmers I've met, no C program ever works as advertised. This is particularly true when the C program is built on the shaky foundation of modern operating systems.

When AOLserver was a $5,000 product, support was amazing. I would complain about a bug and three hours later would receive a patched binary. After the AOL buyout and a reorganization or two, support really suffered. If you were running on SGI Unix, you were golden because that's what Primehost, AOL's commercial Web site hosting arm, runs. Otherwise, good luck to you. They were short-staffed at NaviSoft and didn't feel like writing workarounds for Solaris bugs. Check http://www.aolserver.com/ for the current support situation.

Availability of shrink-wrapped software for the AOLserver is nil. AOLserver has a very small market share and most of the people who run it are capable of writing their own back-end systems. They aren't going to pay $50,000 for program to serve advertising banners when they could write a few pages of Tcl code to do the same thing more reliably. Of course, AOLserver can run packages of CGI scripts as well as any other Web server so you can still install important packages like the Excite search engine for Web servers.

AOLserver 1.0 was the first of the threaded Web server programs and is therefore right up there with the fastest products on the market. A typical Unix box can serve about 800,000 static hits a day with AOLserver 2.1.

One final note about AOLserver: If you want to exploit its Tcl API and RDBMS connectivity without any sysadmin or dbadmin hassles, then you can pay about $200 a month for a virtual server in someone else's cluster. You get your own domain name, your own database, and redundant T1 or T3 connectivity. ISPs providing this service include Primehost (http://www.primehost.com), AM Computers (http://am.net), and a German outfit, http://www.carpe.net.

Apache

Proceeding alphabetically, we arrive at the most popular Web server, Apache. WebCrawler credits Apache with a 35 percent share of the Web server program market. Apache seems to be used at the very simplest and the very most complex Web sites. The simple users just want a free server for static files. The complex users basically need a custom Web server but don't want to start programming from scratch.

The Apache API reflects this dimorphism. The simple users aren't expected to touch it. The complex users are expected to be C programming wizards. So the API is flexible but doesn't present a very high level substrate on which to build.

Support for Apache is as good as your wallet is deep. You download the software free from http://www.apache.org and then buy support separately from the person or company of your choice. Because the source code is freely available there are thousands of people worldwide who are at least somewhat familiar with Apache's innards.

Big companies that like to spend big dollars on shrink-wrapped software don't generally trust free software. Hence, I haven't seen too many packages for the Apache API.

Apache is a pre-forking server and is therefore reasonably fast. Bottom line: 80 percent of Web sites have decided that a source-code available server is the right one for them; Apache is the best and most popular of the source-code available server programs.

Netscape Enterprise/FastTrack

About 12 percent of the Internet's sites run various versions of the Netscape servers. That's not a lot better than the ancient CERN server. However, this market share is misleading because it isn't adjusted for the number of hits served. Many of the most heavily accessed and funded sites use the Netscape servers.

The Netscape 2.0 servers have a variety of APIs. They carry over their dangerous C API from their 1.x server programs. There is also a Java API and a JavaScript API (LiveWire). For my taste, any C API is too dangerous. Java is a safe language but it requires compilation and then installation of the compiler output into the server. You can build a Web site this way but it is a bit like digging a flower bed with a backhoe. The Netscape tool that shows the most promise is LiveWire, which attempts to cover some of the same ground as the AOLserver's Tcl API, including efficient RDBMS connectivity.

Let's start by considering the development cycle. Suppose that three graphic designers have been working on a Web site for two months. They've built a directory on their Web server of 30 .html files. Then they call in a programmer to add a dynamic page or two, perhaps one that talks to the database. These pages must be wrapped up into a LiveWire application. A programmer can add JavaScript inside <SERVER> tags to any of the .html files. However, these aren't parsed by the Enterprise Server on the fly. The LiveWire application has to be compiled into a .web file. Then the programmer has to go into the application manager (an administration Web page) and load the application into the Enterprise Server.

Suppose now that a typo is discovered in a dynamic page. The graphic designer edits the file as always and reloads the page. Nothing is changed. That's because the .html file was distilled into a .web compiled object.

So the graphic designer asks the programmer to recompile the application from the Unix shell and/or figure out the Site Manager program and recompile the application. Upon reload, the page is . . . unchanged!

The Enterprise Server is a running C program. It has loaded the byte-code compiled .web file into its memory and will not reexamine the .web file ever. So someone has to go into the application manager and say "restart this particular application."

A change that, with AOLserver, would require a few keystrokes in Emacs or Netscape Gold requires three steps with Enterprise/LiveWire:

  1. Edit the .html file.
  2. Recompile the .web file.
  3. Restart the LiveWire app from the appmgr.

Not so great, eh? Well, suppose you can get past the painful development cycle? What about the API per se?

You can't call a function to send e-mail. You can't call a function that will go out on the Net and grab a Web page from another server. Thus, even the simplest AOLserver sites that I've built would not be feasible in LiveWire. However, Netscape is supposedly going to rectify some of these deficiencies in the 3.0 version of the server.

One nice feature about LiveWire is that there is a lot of infrastructure for maintaining per-client state.

Netscape has developed a fairly clean hierarchy of what they call objects (really just data structures). There is a request object that contains data from the form that the user just submitted and other information that might vary per request. There is a client object that contains data intended to persist for a client's session. There is a project object that persists for as long as the application is running and a server object that persists for as long as the server is running.

These objects provide a natural and simple way to maintain state. For example, if an application has to compute something with a very expensive database query, it could cache the result in the project object. Setting information for a client session is very straightforward as well. You can just say "client.session_id = 37;" and the server will remember.

The semantics of client objects are good, but Netscape's implementation of them in LiveWire 1.01 is abysmal. You have several choices for maintaining these objects. What you'd think would be the best way to do this is to hold them on the server and then reference them via a unique key stored either in a magic cookie or encoded in the URL.

Netscape provides this method, but they provide it in an incompetent fashion. The documentation refers to a server-side "database" of these objects but it isn't a real database management system like Oracle. When a page wants to get client object information, LiveWire "checks out" the entire object and subsequently denies even read access to these objects to other pages. This avoids bugs due to lack of concurrency control, but it means that the client object is unusable for many applications. Two subframes of the same frame, for example, cannot both get client object info. Or if the user, deciding that one database-backed page is too slow, opens another browser window and uses that to connect to another portion of the same LiveWire app then the second connection won't be able to get to the client object data it expects. This will probably result in a server-side error.

If you want stuff to work, you probably have to set LiveWire to ship all the data back and forth to the client with every page access. This approach has several disadvantages, the first of which is speed. You are gratuitously transporting potentially many kilobytes of data back and forth across the network.

The second drawback is flexibility. Browsers aren't required to store more than 20 cookies for a particular path so you can't have more than 20 client object variables. I don't think the programmer even gets a warning when this number has been exceeded and information is being lost.

One of the most serious objections is that confidential information may have to be sent back to the user. Netscape's examples include scenarios where the RDBMS username and password are stored in a client object. One certainly wouldn't want these residing in a random user's Netscape .cookie file. Or even private information that the user has supplied, like a credit card number-a side effect of using a Web site shouldn't be that the user's credit card number ends up stored back on the client computer (which might, for example, be a machine in a public library's reading room).

Database connections are handled more efficiently from LiveWire than with CGI, but less efficiently than with the AOLserver. AOLserver allows you to set up a reasonable number of simultaneous connections for the database, eight, for example, and then all the users share those connections. The operating system sees a stable configuration because the number of processes remains constant. Netscape's basic model is one user of a LiveWire application equals one database connection. This means that the server is forking fairly frequently as users come and go on the site. You might even have two or three database connections for a single user if you don't want your whole site to be one monolithic LiveWire app.

Database vendors like Oracle are still living in the 1980s when it comes to their C libraries, which aren't "thread-safe." Unfortunately, the modern way to build Web servers is not with Unix multiprocessing but with threads. NaviSoft dealt with this rather elegantly by adding some locks to the Illustra C library and then writing external driver processes for other RDBMS vendors. Netscape deals with this by saying, "One Enterprise Server process will only have one connection to the RDBMS." So that means you have to carefully set up your Enterprise Server to have lots of processes and very few threads per process. In the end, if you have 100 users interacting with your server, you'll have 100 Netscape Enterprise processes spawned plus 100 RDBMS server processes spawned. Your CPU and memory vendors will be very pleased indeed with this server load requirement. You'd probably be able to handle the same number of users with AOLserver on a quarter or one-eighth the server horsepower.

Depending on your licensing arrangement with the RDBMS vendor, you might find that it costs a lot of extra money to have hundreds of simultaneous connections rather than a handful.

Note: If you are running Windows NT, the situation is a bit different. Enterprise Server runs as only one process on NT and relies on database vendors producing thread-safe libraries. Unfortunately, Informix (the database bundled with LiveWire PRO) didn't get with the program. Their library is not thread-safe. Hence each LiveWire application can only keep one connection to the database open at once.

Big companies like to buy Netscape server programs because they think they will get support. Netscape support can be useful if you have done something wrong in configuring the server. If you are a paid-up customer, you can e-mail them your .conf files and they will figure out what the correct incantations should be. However, if your problem is due to a bug in their code then the support staff is at sea. They will try to help you find a workaround but I've never really seen them persist. Nor have I ever seen them deliver a patched binary to fix a problem identified by a customer.

The problems that I've personally encountered with Netscape Enterprise include:

Source code is not available for Netscape servers; if you don't like the support that you get from the company then you are stuck.

The best reason for running the Netscape servers is the availability of shrink-wrapped software packages for their C API. People with big money run the Netscape servers so Web technology companies always port their CGI scripts into the Netscape API first.

Like all threaded servers, Enterprise and FastTrack are very fast.

Oracle WebServer 2.0

If you've ever tried to use http://www.oracle.com (one of the slowest and least reliable sites on the Internet) then you probably won't be in the market for this server program. Nonetheless, you'd expect it to be quite adept at connecting to relational databases, or at least the Oracle relational database.

WebServer 2.0 includes Java and PL/SQL APIs. Both are rather underpowered by AOLserver standards. For example, if you want your PL/SQL-backed page to send e-mail, you have to install (literally) another 100MB of Oracle software. You can connect to the Oracle RDBMS through either Java or PL/SQL, but both approaches are extremely slow. Server throughput and responsiveness are about one order of magnitude worse than with AOLserver. Probably Perl CGI scripts running behind a conventional Web server like Apache would be just as fast as WebServer 2.0.

No source code is available for Oracle WebServer and support is illusory at best. In their released version of WebServer 2.0, there was only one way for a PL/SQL-backed page to issue an HTTP redirect. This crashed the entire Web server. It took a week of plowing through the Oracle support organization to figure out that this was a known bug. This is the kind of thing that the AOLserver folks would have fixed in a few hours. Netscape would have rushed a patch release out the door if anyone had found a bug like this. It took Oracle six months to fix the bug.

My theory of why Oracle is unable to support a Web server is that they are so steeped in the RDBMS, a technology that has changed very gradually since its inception in 1970. The art of collaborating with customers to find the best solution is natural to a lot of Web software vendors but not to RDBMS vendors.

The kinds of companies that would be likely to take Oracle's word for it and install WebServer 2.0 are also the kinds of companies that would be likely to purchase shrink-wrapped software. However, I've never seen anyone offering shrink-wrapped software for Oracle WebServer 2.0, perhaps because, as of January 1997, the NetCraft folks could only find 310 sites worldwide that were running it (versus 87,000 Netscape-backed sites).

I never figured out how fast Oracle WebServer 2.0 was at serving static files and Oracle has never posted any benchmarks on their site. It is a threaded server so presumably the answer is "fast enough."

Connectivity

Sheep at the New Jersey State Fair 1995.  Flemington, New Jersey.

Having an eight-headed DEC Alpha with 1GB of RAM in your living room is an impressive personal Web server, but not if it has to talk to the rest of the Internet through a 28.8 modem. You need some kind of high-speed Internet connection. If you aren't going to be like me and park your server at above.net then you need to think about higher-speed connectivity to your premises.

ISDN

Integrated Services Digital Network (ISDN) is a 128-kbps point-to-point connection from the phone company. It is the bare minimum bandwidth that you need to run any kind of Web server, though most publishers would be far better off co-locating a Unix box in a T1-connected network somewhere and remotely maintaining it.

If you decide to take the ISDN plunge for your home or business in order to manage that co-located Web server, try to get one vendor to take responsibility for the entire connection. Otherwise, your position will be the following: You want packets to go from the back of your Macintosh into the Net. If the packets are getting stalled, it could be a malfunctioning or misconfigured BitSURFR, in which case you need to call Motorola tech support (yeah, right). It could be the line, in which case you should call your local telephone company. It could be your Internet service provider, in which case you should pray.

With three organizations pointing fingers at each other and saying it's the other guys' fault, it is amazing to me that anyone has ever gotten ISDN to work. I finally got mine to work by scrapping the Motorola BitSURFR and getting an Ascend Pipeline 50 router, the product recommended by my ISP (an in-house MIT organization). They wanted me to get an Ascend 50 so that they could configure it properly and I would just have to take it home and it would work.

It didn't.

I called Ascend tech support and waited two minutes on hold before being connected to Jerome. He dialed into one of my ISDN channels and poked around inside the Pipeline 50. Then he said, "Your subnet mask is wrong for the range of IP addresses that they've given you. I fixed it." I noted that my Macintosh was complaining that another computer on the same wire was claiming that same IP address. "Oh, you've got Proxy ARP turned on," Jerome said. "I've turned it off. Everything should work now."

It didn't.

It turns out that MIT bought its big ISDN concentrator from Ascend as well. So I had Jerome connect into that and poke around. "They've set the subnet mask incorrectly for you there as well."

The only thing that could have darkened my day at this point was the bill. ISDN was designed in the 1970s to provide efficient reasonably low cost point-to-point digital communication across the continent. You really cared to whom you were connected and would be willing to pay a big price for that service.

Now people just want to use ISDN for Internet access. They don't really care to whom they are connected. In fact, having to choose an ISP is an annoyance and they would probably much rather the phone company took the bits and routed them into the Net. However, due to regulatory restrictions and corporate inertia, the Regional Bell Operating Companies (RBOCs) haven't all caught up to this.

Most of the RBOCs charge you per minute if you are using your ISDN line to call across town with a data connection. An example of a forward-thinking RBOC is Pacific Bell. They will provide you a complete package: ISP service, modem, and line. For this you pay $75 per month. If you use the line between 8 a.m. and 5 p.m. Monday through Friday, you pay 1 cent per minute. So if you left your line connected continuously you'd pay an extra $120 per month. By contrast, assuming you could ever get three vendors in Massachusetts to work together, the same pattern of usage would cost you 1.6 cents a minute times 24 hours times 60 minutes times 30 days equals . . . about $700 a month! If you want to call a little farther or, God-forbid, your line is billed at business rates, you could be paying a lot more. One guy in my lab at MIT got a bill from NYNEX for $2,700 one month. NYNEX will be naming their next building after him, I guess.

A more common approach is to defraud the phone company by programming your equipment to originate all calls with a voice header. It looks to the phone company like you've made a voice call to another ISDN telephone and are chatting away. But your ISP has in fact programmed their "modem bank" to answer all calls whether they have voice or data headers. You end up getting 56K instead of 64K per channel but you only pay the voice tariff, for which there is usually a flat monthly rate.

The reaction to this common practice varies among the RBOCs. The good ones say, "We really ought to provide flat-rate ISDN data for customers. In fact, we really ought to just give them Internet service before they all desert us for the cable TV companies." A more common attitude is "We're never going to do flat-rate ISDN because the customers are tying up our switches and capacity and it is costing us and we'd really like to disallow flat-rate voice, too, so that the analog modem crowd doesn't clutter our switches."

Your First T1

If you want to join the club of Real Internet Studs, then at a minimum you need a T1 line. This is typically a 1.5-Mbps dedicated connection to somebody's backbone network. You generally get the physical wire from the local telephone monopoly and they hook it up to the Internet service provider of your choice. The cost varies by region. In the San Francisco Bay Area, you can get a whole package from PacBell including the wire plus the Internet packet routing for $800. More typical is a $2,000-a-month package from a vendor like ANS, BBN, MCI, or Sprint (the four leading backbone operators).

I think it is risky to rely on anyone other than these four companies for T1 service. It is especially risky to route your T1 through a local ISP who in turn has T1 service from a backbone operator. Your local ISP may not manage their network competently. They may sell T1 service to 50 other companies and funnel you all up to Sprint through one T1 line.

You'll have to be serving about 500,000 hits a day before you max out a T1 line (or be serving 50 simultaneous real-time audio streams). When that happy day arrives, you can always get a 45-Mbps T3 line for about $50,000 a month.

Cable Modems and ADSL

The cable network is almost ideal topologically for providing cheap Internet. There is one wire serving 100 or 200 houses. Upstream from this there is a tree of wires and video amplifiers. The cable company can say, "We declare the wire going to your house to be a Class C subnet." Then they put a cable modem in your house that takes Ethernet packets from the back of your computer and broadcasts them into an unused cable channel. Finally, the cable company just needs to put an Internet routers upstream at every point where there is a video amp. These routers will pull the Internet packets out of the unused cable channels and send them into one of the usual Internet Backbones.

If all of the folks on your block started running pornographic Web servers on their NT boxes then you'd have a bit of a bandwidth crunch because you are sharing a 10-Mbit channel with 100 other houses. But there is no reason the cable company couldn't split off some fraction of those houses onto another Ethernet in another unused video channel.

My friend Pete lives in Newton, Massachusetts and he was one of the original Continental Cablevision beta testers. I asked him how long it took him to hook up his Macintosh to the network. "Thirty-seven seconds." Though Continental only promises 1.5 Mbps in and 300 kbps out, he says that it is faster than being inside MIT Net. The price is $60 a month for unlimited access and that includes some traditional ISP services like POP mail and a News server. If you don't trust your cable company, don't give up on your local phone company. In their more candid moments and incarnations, the phone companies more or less concedes that ISDN sucks. Their attempt to keep you from pulling the plug on them and letting your cable monopoly supply all of your communications needs is Asymmetrical Digital Subscriber Line (ADSL).

Telephony uses 1 percent of the bandwidth available in the twisted copper pair that runs from the central phone office to your house ("the local loop"). ISDN uses about 10 percent of that bandwidth. With technology similar to that in a 28.8 modem, ADSL uses 100 percent of the local loop bandwidth, enough to deliver 6 Mbps to your home or business. There are only a handful of Web sites on today's Internet capable of maxing out an ADSL line. The price will certainly be very low because the service was primarily designed to deliver video streams to consumers, in competition with video rental stores and pay-per-view cable.

As a Web publisher, your main question about ADSL will be what happens to the packets. There is no reason the phone company couldn't build a traditional hierarchical data network to sit in their central offices next to their point-to-point network. Then they could sell you low-cost one-vendor Internet access and their cost would be comparable to that of the cable companies. Some of the RBOCs, however, are afflicted with the brain-damaged notion that people want to choose their router hardware and their ISP. So you'll have to buy an ADSL line from your RBOC and then also cut a deal with ANS, BBN, MCI, or Sprint to carry your packets into the Net.

The Big Picture

Burning car.  New Jersey 1995.

Processing power per dollar has been growing exponentially since the 1950s. In 1980, your home computer was lucky to execute 50,000 instructions per second and a fast modem was 2,400 bps. In 1997, your home computer can do 200 million instructions per second but it communicates through a 28,800-bps modem. You've gotten a 4,000-fold improvement in processing power but only a tenfold improvement in communication speed.

The price of bandwidth is going to start falling at an exponential rate. Your challenge as a Web publisher is to figure out ways to use up all that bandwidth.


If you like this chapter then you probably should buy a copy of the book from which it was taken and/or read the book on-line.


philg@mit.edu

Reader's Comments

I suppose most web sites out there are running on some Unix system, but there's a small and yet tenacious group of system managers that cling to VMS, mostly because it's so robust and reliable. Plus it offers mundane features like a help facility that's actually useful (which is accessed by typing HELP, of all things).

The best web server package for VMS is OSU WEB, written by Dave Jones at Ohio State University, it's free, and it has excellent support through a fairly active mailing list (vms-web-daemon-request@kjsl.com to subscribe).

-- Javier Henderson, May 11, 1997

The rationale you give for buying Sun equipment is that most free software out there "just works" when you build it for a Sun machine. I have a few things to say about that:

1. I argued long and hard with the executives of Sun that they should have a closer-than-arms-length relationship with the GNU software folks, because that would make their machines the hardware of choice for an ever-increasing niche of the market. (The VP's were split on this issue--while some agreed it was fundamentally sound logic, others followed that logic to the conclusion that I was really branding Sun's own software as crap that people would discard in preference to GNU.) In any event, Sun was very free-software friendly, and the fact that Richard Stallman and other GNU hackers all used Sun machines only strengthened Sun's position.

2. Over time, Linux has far overtaken Solaris in terms of being free-software and hacker friendly. GNU (and other free software, such as Apache) has become sufficiently mature that essentially all "important" free software works pretty much indisginuishably across Sun, SGI, and other "proprietary" platforms.

3. In the case of hardware, I think more people should look seriously at what SGI has been producing. I became a convert a few years ago, and have nothing but great things to say about their new lineup. Whereas you cite a "wimpy Sparc5" doing 30-70 updates-per-second (depending on whether you have autocommit on or off), I just measured 212 updates/second on a similarly vanilla O200 server running Oracle 7.3.3, regardless of the state of the autocommit variable.

Cheers,

Michael

-- Michael Tiemann, August 10, 1997

I like your pages.

Regarding your uptimes.... We had some Linux- Newsserver with 103 days uptime. No prob. I booted to upgrade the kernel from 1.3.52 to 2.0.X

Cheers Peter

-- Peter Keel, August 25, 1997

As for web server platforms, another Unix to really take a look at, FreeBSD. We are running a 6000 user ISP, along with web hosting services for about 350 domains on Pentium Pro 200's. We have overbuilt our site, having a machine for each service, but it pays off when you have to do maintenance on one of those machines. With the avg. cost of a ppro 200 fully loaded for a web server being under $3000 (When using GOOD hardware, the key to having this work well) this is not something to ignore. Who ever it was using a sound card scsi card, got what he desired.. Most pc hardware is junk, you just have to sort though it and find out what is not. Mailing lists for the OS you want to use are a great place to find out, along with FAQ's. I use Intel Ppro or ASUS motherboards, Adaptec 2940uw/3940uw scsi cards, a high quality fast wide 7200rpm drive, such as a Quantum Atlas II, a Intel 10/100 ethernet card, and a good quality case with a good power supply and cooling. With a combo like this, you will have very few problems.. I see machines with 100 day plus uptimes all the time.. Most of the downtime here is upgrading hardware, not fixing. That is one thing PC's don't do well, the ability to upgrade while running. And Sun only does this for some things. To increase your data reliability, look at a RAID5 box, base price for a good scsi to scsi unit is about $2000. Now compare this with Sun's Raid-Arry, and you will notice a large price difference.. When you buy Sun, one thing you are paying for is the time they have used to test interopablity. Sun's do have their place. PC's do also, if you do have a hardware failure, you can 1. pull apart your desktop machine for the night for that power supply that went bad 2. run down to one of who knows how many local computer stores and get a replacement, something you can not do with Suns (in most areas atleast).

-- Cameron Slye, November 27, 1997
Regarding Linux, Netscape Fasttrack Server is now available from Caldera with certain versions of their OpenLinux distribution; it is an extremely easy to administer web server (graphical via the web, etc.) for the people who do not wish to take the time to learn the Apache setup. (Although, IMHO, Apache is a far better server.) On the cheap PC hardware issue, you get what you pay for. I have many systems running Linux which are extremely stable (even on the newest development kernel). The only reason I ever have to take my machines down is to upgrade them; just as an experiment, I have a 486 sitting in a closet acting as a router (Linux does a great job at this, too) which I have not upgraded/rebooted in over 1 year.

-- Nate Carlson, December 26, 1997
One more comment: Oracle has officially announced support for Linux, to be released in March of 1999 or that area.

-- Nate Carlson, July 18, 1998
Uptime ? Watch this. A 90MHz Pentium processor running my intranet, sendmail, fetchmail, lights through X10 etc. Uptime limited because about 3 months ago I didn't have a UPS yet. Now I do...

12:32am up 97 days, 10:19, 1 user, load average: 0.06, 0.03, 0.01

Not bad for an old machine, he? Oh, and I'm running Linux, Apache, MySQL etc... Not as fancy as HPUX, but I salvaged the machine from a scrap-heap.

-- Peter van Es, March 14, 2000

Add a comment | Add a link