Building Relational Database-Backed Web Sites

by Philip Greenspun for the Web Tools Review

Note: This content has been superseded to a large extent by four chapters of my new book.


What is a Relational Database?

In the bad old days, there were hierarchical databases. These were essentially files of records with pointers to other records. You wrote a PL/I program or something to tranverse these pointers and got your data out.

IBM decided that this was too efficient for their new, fast, expensive mainframe computers. In the 1970s, they therefore invented relational database management systems (RDBMS) and the declarative query language SQL. A declarative language allows the user to say what he wants and then the RDBMS query planner figures out how to get it. The RDBMS is allowed to store the data however it wants.

It is much harder to have "a little bug" in an SQL query than in an arbitrary program. Generally it either describes the data that you want and works or it completely fails in an obvious way.

The main problem with RDBMSes is performance. If your query runs too slow, you have to try to figure out what this enormously complicated program is doing. You didn't write this program and you don't have the source code. You then have to figure out what different SQL incantations will get you the same set of data back but faster.

Of course, you could also buy a bigger mainframe... (or keep running the non-relational software you used in the 1960s, which is what the airlines do for their reservations systems)

What's wrong with using the Unix file system?

The Unix file system is a failure by any of the standards that professional computer scientists apply to file systems. It is unreliable, has no tolerance for broken hardware, and is slow. It is also way better and faster than an RDBMS if what you want to do is serve static data to the Internet.

If you have a big database of information that seldom changes, you may be much better off writing a program to extract it from the database and generate static HTML pages. You can repeat this process periodically and the net result is that you will consume 1% of the computer resources that you would have if you'd done the queries on-line. This is how Yahoo works, for example.

You need an RDBMS if you have data that changes frequently. Maybe you are allowing users to add themselves to a mailing list. Maybe you want to let 10 different people update a patient's medical record. It is possible to write perl, C, or Lisp programs that maintain this information in ordinary files but if you make a mistake in one of these programs, it is possible that attempted simultaneous updates will trash your database and/or that one update will be lost. Relational databases have concurrency control and make sure that all the updates happen without any tables getting corrupted.

How does an RDBMS talk to the rest of the world?

Have you ever heard a $250,000/year MBA talk ad nauseum about "client/server"? The term originated in the RDBMS business data processing world. The profound concept here is that the database server is a program that sits around waiting for another program, the database client to request a connection. Once the connection is established, the client sends SQL queries to the server, which inspects the physical database and returns the matching data. These days, all connections are made via TCP sockets even if the two programs are running on the same computer.

For a traditional RDBMS-backed Web site, the RDBMS client is the Web server program (e.g., NaviServer 2.0) or maybe a CGI script that has been spawned in response to a user request for a URL. The user types something into a form on a Web client (e.g., Netscape Navigator) and that gets transmitted to the Web server (e.g., NCSA 1.4) which is itself or spawns an RDBMS client (e.g., perl script) which has or opens a connection to an RDBMS server (e.g., Oracle). The data then goes back from the RDBMS server to the RDBMS client to the Web server which sends it back to the Web client.

Does that sound complicated and slow? Well, yes it is, but it can be fast enough. I interfaced the 60 GB Boston Children's Hospital to the Web this way using the rudest of tools (Oracle 6, Oraperl, NCSA 1.3, ancient Sun SPARC). Try it out.

How to make really fast RDBMS-backed Web sites

If you run a Web server that maintains a pool of already-open connections to one or more RDBMS systems and run your programs inside the Web server instead of as CGI scripts, then for each URL requested, you save: (1) the cost of the Unix fork for the CGI script, (2) the cost of the Unix fork for the Web server (though Oracle 7 and imitators pre-fork server processes), (3) the cost of establishing the connection to the database, including a TCP session and authentication (databases have their own accounts and passwords), (4) the cost of tearing all of this down when it is time to return data to the user.

You could probably write all this into the API of a lot of servers. The problem is that servers tend to use C as an extension language. That means your software development cost will be prohibitive unless you shop the work out to Bangalore. Even if you are rich, if one of your C drones makes a typo, the little API extension will run wild in memory and bring down all of your Web services, not just break that one URL as in the case of a bad CGI script.

If this frightens you, you could just use NaviServer 2.0. Wizards Jim Davidson and Doug McKee have done all of the above work for you and it has been tested in hundreds of production sites since May 1995. I have personally used this software for several heavily used sites and feel that it is without question the best way to go for RDBMS-backed sites. You can write all of your code in C or Tcl and talk to the database. The Tcl runs inside the NaviServer's process and imposes no significant overhead (Tcl is a slug but that is negligible compared to the cost of going out to the RDBMS). [Note: I maintain a server of NaviServer example apps, complete with source code.]

The NaviServer Tcl API was introduced in early 1995. It remains for me the most efficient and programmer-friendly way of talking to an RDBMS. There are some competitors finally reaching the marketing, though, e.g., Netscape Livewire.

If you are building a richly interactive site and want the ultimate in user responsiveness after a painful and slow initial download, then Java is the way to go. You can write a Java applet that starts running inside the user's Web client and makes its own TCP connection back to the RDBMS, thus completely bypassing the Web server program. Of course, this won't work for people without Java-compatible Web browsers. Still, it is probably the way to go if you're doing an on-line chat system or something.

Of course, just because you are talking to the database in a reasonable manner doesn't mean that the database per se can handle the load.

How to maintain the data in the RDBMS

If you have an RDBMS-backed Web site, you probably have some thousands of users "out there" and a handful of people "back here" who are supposed to upate the information in the site periodically. NaviServer provides some administrative tools that let anyone with a Web browser and the right password create and edit tables in the database. If you don't like that interface, which is sort of clunky and won't let you do joins (so all the info has to be in one table), then you can build a whole pile of CGI scripts or NaviServer Tcl scripts that sugar up the tasks. RDBMS systems tend to come with Mac or PC clients that will allow you simultaneous access and a potentially nice user interface to the DB. Finally, there is Microshaft Access which my friend Lori swears by. You can use this with any database that has an "ODBC driver" which is a Windows program that makes the TCP connection to the RDBMS and then presents the data to Access in a form that it is prepared to accept (this works with a lot of other PC programs too; you can pretty much use all the standard RDBMSes interchangeably). Unfortunately, I'm not a Microsoft Achiever (i.e., I don't have a Windows machine) so I've never tried it.

Remember that backups and maintenance of an RDBMS are typically done by a full-time staffer at most companies. Be prepared for some days of pain installing and then periodic recurring pain if you want to be your own database administrator (dba). Of course, if you get good at it and this Web thing collapses, you can always get a job at a big company as a dba (Oracle dba's are always in high demand and start at $60-80,000/year).

Choosing an RDBMS

In general, all the RDBMS companies claim to understand exactly how a winning Web site should be built. You give them your money and they'll tell you what time it is on the Web. It all sounds really plausible until you look at the slow, content-free Web sites they've built for themselves. None of the Web crawlers were built by any of the database vendors and, so far as I know, none of the Web crawlers even use any software made by the RDBMS vendors. Except for Sybase, none of the DB guys have even figured out how to put their documentation on the Web.

Oracle

"Nobody ever got fired for buying an Oracle database." I've never actually heard anyone say that, but it might be true. Oracle is the standard. If there are random third-party tools that you want, you can bet they take full advantage of Oracle. Oracle is comparatively fast and stable. Larry Ellison, their founder, gives hope to nerds everyone by driving around in his Acura NSX and getting sued by babes for sexual harassment (of course, everyone envies Bill Gates for his money, but how many envy him for his lifestyle?).

Oracle is probably the best choice for most people, but the company is also famous for reaming its users and then holding out a bucket to catch the gushing money (how do you think Ellison paid for his NSX?). If you swagger in there and tell them you're going to have 100,000 users for your Web site running on your 16-headed DEC Alpha, they'll cheerfully look that up in their price book and hand you a bill for $1 million.

Actually, you may run out of energy before you even get a price quote. Oracle is waiting to see if this Internet fad catches on so you won't find prices on their Web site (which is wicked slow because they're waiting to see if this Netscape WIDTH and HEIGHT tag fad from 1994 is going to last). If you send email to their sales email address, they very helpfully email you back with the Oracle 800 number. I called it; a recording said that everyone was out being trained.

If you are using a single Web server to make all of your connections, then the RDBMS thinks that there is in fact only one user out there who happens to want to leave, say, 16 client programs open simultaneously. Then you might only need a minimum 5-user license which is under $16,000 for Oracle Enterprise (necessary if you want Context, the new text indexing product) or $1500 for Oracle Workgroup (basically the same program). If you write Java applets and let users connect from all over the Net, then each one counts as a separate user.

A nice thing about Oracle is that they rely on lawyers rather than C programs to manage licensing. They'll threaten to audit you and then harass you if you consistently have too many users connected, but they won't shut your Web server down because a buggy C program thinks you've exceeded your license (when you in fact haven't). [See Illustra, below.]

I've never personally seen anyone get useful assistance from Oracle support. Recent example: I reported a problem with a PL/SQL program that looked like it should work. A week later, I still hadn't heard back from anyone. So I telephoned and was told to try changing two characters in the 25-line example (I was using a PL/SQL table of records and they wanted me to switch from 0-based indexing to 1-based, something I'd in fact tried before asking for support the week before). I said "after a week, all you can tell me to do is make a change that you could have tested yourself in 30 seconds?" The change resulted in a different error message ("internal error") but still no useful computation. So I called back and was told that I eventually would be told whether this was in fact a bug. I never got a workaround. I never got an answer.

Oracle WebServer 2.0 Review
I've played around quite a bit with Oracle WebServer 2.0. It is a fairly well-conceived product that combines a threaded server with the ability to have PL/SQL or Java-backed pages without forking. It comes as a 102 MB tar file and installation can range anywhere from painful to impossible. They have a configuration server that you drive from Netscape but I've never managed to make OWS 2.0 work without editing config files in Emacs.

Here are the things I like about OWS 2.0:

Here are the things I don't like about OWS 2.0: I think that one can get 95% of the benefits of OWS 2.0 (and none of the problems) by writing a bunch of PL/SQL functions that do expensive queries and return strings. Then you use the proven reliable Tcl API on the proven reliable NaviServer to talk to Oracle. This way you get all the benefits of caching and tight semantics of PL/SQL, the speed of not having to fork or open database connections, and the flexibility of Tcl when appropriate. Unfortunately, development of the native NaviServer Oracle driver seems to be stalled right now (September 1996 at the earliest?). So the only way to have this "best of both worlds" is to run NaviServer on Windows NT and have it talk to Oracle via the ODBC driver (allegedly inefficient). Oh yes, OWS 2.0 costs $2500; NaviServer is free.

Free code: If you want to see an example of an OWS 2.0 application, check out my Bulletin Board (threaded discussion group) demo. It is available as both NaviServer Tcl for Illustra and as OWS 2.0 PL/SQL.

Oracle, of course, believes that their proprietary Web servers are the future. Just check the "Server:" header below to gauge the depth of their commitment...


philg-sub-47> telnet www.oracle.com 80
Trying...
Connected to www-2.us.oracle.com.
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.0 200 Document follows
Date: Wed, 06 Mar 1996 22:37:51 GMT
Server: NCSA/1.5
Content-type: text/html

(Sometime late in the summer of 1996, Oracle could no longer stand the humiliation being inflicted on them by suck.com and articles like this. So they are actually using their own software now.)

Sybase

I've never used Sybase, but I think it is pretty comparable to Oracle. They have less market share and were apparently the first to make it easy to run replicated databases across multiple servers.

Informix

Informix made its reputation by allowing awesome numbers of transactions on huge parallel machines. I think that Oracle and Sybase can do this these days as well. I only have personal knowledge of one Informix site and the people there raved about the customer support. Informix apparently ran all kinds of tests with different data models on different machines and produced reports to figure out how this customer should organize everything.

Microsoft

Microsoft sells something called SQL Server as part of BackOffice. It runs on Windows NT and at least on a feature checklist looks pretty good. I asked a database wizard friend about it. He said "it's really slow." I replied, "but it's cheap, only about $1000 for the server and then maybe $100 for each client." He said, "You get what you pay for."

Let's hope it is more reliable than Word and Excel...

Illustra

I've had more experience with Illustra than with any other product and therefore more pain. I give away all of my tips for using this system. The database has some type extension capability, which they called "object orientation" so that it sells better to MBAs. If what you really need is an object database, you'll find that ObjectStore and similar products are literally about 1000 times faster. However, having higher level types, e.g., images, can make it possible to do in SQL what you formerly would have been forced to do with an SQL query and then grabbing something out of the Unix file system. If you feel like writing a bunch of C code and having your bugs crash the database server, then you can really extend the Illustra system by adding your own radically new types.

As far as I'm concerned, though, Illustra has really never understood the fundamental advantages of object-oriented programming that go back to the 1970s. That is, you can write a complex system in modular chunks with advertised interfaces and then the next programmer who comes along doesn't have to start from scratch. He can simply write a bit of new code that overrides and/or specializes the behavior of the existing modules. This isn't the case in Illustra and because what they built didn't adequately anticipate my needs as a Web publisher, I've ended up having to use their system exactly as I'd use Sybase or Oracle. [see my tips page]

What's the problem with that? Well, Illustra 3.2 really isn't as good an Oracle as Oracle 7. Oracle and Sybase can do at least 30 inserts and updates/second on a generic Unix box. Illustra is down around 10. Illustra SELECTs can be quite fast, but sometimes their query planner and caching algorithms will hose you; your user ends up waiting minutes for a query that would have executed in seconds on an Oracle.

I've found Illustra support to be excellent. They've gone the extra mile for me even when the problem was simply my lack of SQL ability. Unfortunately, their development staff hasn't been very responsive. They seem sure enough that they've built the ultimate "database for cyberspace" that they don't need to hear suggestions from actual Web site developers.

Informix bought Illustra in December 1995. They plan to ultimately merge the Illustra and Informix systems into a "Universal Server" (which is, confusingly, what Oracle calls its currently shipping release (Oracel 7.3)). This is currently vaporware, but one tangible change from the Informix merger is that the price of Illustra for use on the Web has gone way up.

References

On the Web

The formatting is ugly, but you can get a feel for what it is like to administer and use a typical RDBMS by looking at the on-line Sybooks, which document the Sybase RDBMS. The most interesting one here for beginners is probably Transact-SQL User's Guide.

A SQL Database->Web page has some useful links.

Jean Anderson wrote a cute interactive SQL tutorial on the Web (built with NaviServer/Illustra).

If you want to get that old time religion, Oracle Magazine has a lot of practical articles and demonstrates that Oracle might actually understand that Web some day because they removed their evil registration requirement.

Dead Trees

It took me a few days to learn SQL at a rudimentary level. My favorite book was The SQL Guide to Oracle (Rick F. van der Lans; Addison-Wesley). Unfortunately, this book seems to have been discontinued, presumably because it covers Oracle 6 and not the latest and greatest (Oracle 7, which from a naive user's point of view is no different). Another book that isn't bad is The Practical SQL Handbook (Bowman, Emerson, Darnovsky; Addison-Wesley) but I don't find it as clear and lots of the examples include Sybase extensions that aren't always identified as such. SQL for Smarties: Advanced SQL Programming (Celko; Morgan) Kaufmann is badly written and organized, but I recommend it because of the sheer frightening nature of the examples. If you've gotten your pathetic Web site up and running and are proud of yourself for your wimpy little SELECTs and thinking of applying for that $200,000/year job at CitiCorp, then check out this book to see what real database programmers do.
Return to Web Tools Review home
Text and pictures copyright 1989-1996 Philip Greenspun. Most of the photos are from Heather Has Two Mommies, not one of my proudest achievements but it just won't die (20,000 hits/day for several years now).


philg@mit.edu

Reader's Comments

The Software AG (http://www.softwareag.com/) in Germany has also a very impressive relational Database system called ADABAS. It is available on all main operating systems (including Linux). The database is mainly used in conjunction with the SAP system. Integration is available for C/C++, TCL, Perl and PHP/FI.

-- Frank Tegtmeyer, October 12, 1997
Add a comment

Related Links

Add a link