Choosing a Relational Database

part of Building Relational Database-Backed Web Sites by Philip Greenspun

What's wrong with a file system (and also what's right)

The file system that comes with your computer is a very primitive kind of database management system. Whether your computer came with the Unix file system, NTFS, or the Macintosh file system, the basic idea is the same. Data is kept in big unstructured named clumps. For example, suppose that you are storing a mailing list in a file system file. You could decide that no email address or person's name can contain a newline. Now you can store one entry per line. Then you could decide that no email address or name may contain a vertical bar. That lets you separate email address and name fields with the vertical bar character.

So far, everything is great. As long as you are careful never to try to store a newline or vertical bar, you can keep our data in this "flat file." Searching can be a slow and expensive, though. What if you want to see if "philg@mit.edu" is on the mailing list? Then you have to read through the entire file to check.

Let's say that you write a program to process "insert new person" requests. It works by appending a line to the flat file with the new information. Suppose, however, that several users are simultaneously using your Web site. Two of them ask to be added to the mailing list at exactly the same time. Depending on how you wrote your program, the particular kind of file system that you have, and luck, you could get any of the following behaviors: (1) both inserts succeed, (2) one of the inserts is lost, (3) information from the two inserts is mixed together so that both are corrupted. In the last case, the programs you've written to use the data in the flat file may no longer work.

So what? Emacs may be ancient but it is still the best text editor in the world. You love using it so you might as well spend your weekends and evenings manually fixing up your flat file databases with Emacs. Who needs Concurrency Control?

It all depends on what kind of stove you have.

Yes, that's right, the stove. Suppose that you buy a $268,500 condo in Harvard Square. You think to yourself, "now my friends will be really impressed with me" and invite them over for brunch. Not because you like them, but just to make them envious of your large lifestyle. Imagine your horror when all they can say is "What's this old range doing here? Don't you have a Viking stove?"

A Viking stove?!? They cost $5000. The only way you are going to come up with this kind of cash is to join the growing ranks of on-line entrepreneurs. So you open an Internet bank. An experienced perl script/flat-file wizard by now, you confidently build a system where all the checking account balances are stored in one file, checking.text, and all the savings balances are stored in another file, savings.text.

A few days later, an unlucky combination of events occurs. Joe User is transferring $10,000 from his savings to his checking account. Judy User is simultaneously depositing $5 into her savings account. One of your perl scripts successfully writes the checking account flat file with Joe's new, $10,000 higher, balance. It also writes the savings account file with Joe's new, $10,000 lower, savings balance. However, the script that is processing Judy's deposit started at about the same time and began with the version of the savings file that had Joe's original balance. It eventually finishes and writes Judy's $5 higher balance but also overwrites Joe's new lower balance with the old high balance. Where does that leave you? $10,000 poorer, cooking on an old GE range, and wishing you had Concurrency Control.

After a few months of programming and reading operating systems theory books from the 1960s that deal with mutual exclusion, you've solved your concurrency problems. Congratulations. However, like any good Internet entrepreneur, you're running this business out of your house and you're getting a little sleepy. So you heat up some coffee in the microwave and simultaneously toast a bagel in the toaster oven. The circuit breaker trips. This is the time when you are going to regret having bought that set of Calphalon pots to go with your Viking stove rather than invest in an uninterruptible power supply for your server. You hear the sickening sound of disks spinning down. You scramble to get your server back up and don't really have time to look at the logs and notice that Joe User was back transferring $25,000 from savings to checking. What happened to Joe's transaction?

The good news for Joe is that your perl script had just finished crediting his checking account with $25,000. The bad news for you is that it hadn't really gotten started on debiting his savings account. You're so busy preparing the public offering for your on-line business that you fail to notice the loss. But your underwriters eventually do and your plans to sell the bank to the public go down the toilet.

Where does that leave you? Cooking on an old GE range and wishing you'd left the implementation of transactions to professionals.

What Do You Need for Transaction Processing?

Data processing folks like to talk about the "ACID Test" when deciding whether or not a database management system is adequate for handling transactions. An adequate system has the following properties:
Atomicity
Results of a transaction's execution are either all committed or all rolled back. All changes take effect, or none do. That means, for Joe User's money transfer, that both his savings and checking balances are adjusted or neither are.
Consistency
The database is transformed from one valid state to another valid state. This defines a transaction as legal only if it obeys user-defined integrity constraints. Illegal transactions aren't allowed and, if an integrity constraint can't be satisfied then the transaction is rolled back. For example, suppose that you define a rule that, after a transfer of more than $10,000 out of the country, a row is added to an audit table so that you can prepare a legally required report for the IRS. Perhaps for performance reasons that audit table is stored on a separate disk from the rest of the database. If the audit table's disk is off-line and can't be written then the transaction is aborted.
Isolation
The results of a transaction are invisible to other transactions until the transaction is complete. For example, if you are running an accounting report at the same time that Joe is transferring money, the accounting report program will either see the balances before Joe transferred the money or after, but never the intermediate state where checking has been credited but savings not yet debited
Durability
Once committed (completed), the results of a transaction are permanent and survive future system and media failures. If the airline reservation system computer gives you seat 22A and crashes a millisecond later, it won't have forgotten that you are sitting in 22A and also give it to someone else. Furthermore, if a programmer spills coffee into a disk drive, it will be possible to install a new disk and recover the transactions up to the coffee spill, showing that you had seat 22A.
That doesn't sound too tough to implement, does it? And, after all, one of the most refreshing things about the Web is how it encourages people without formal computer science backgrounds to program. So why not build your Internet bank on a transaction system implemented by an English major who has just discovered perl?

Because you still need indexing.

Finding Your Data (and Fast)

One facet of a database management system is processing inserts, updates, and deletes. This all has to do with putting information into the databasement. Sometimes it is also nice, though, to be able to get data out. And with popular sites getting 20 hits/second, it pays to be conscious of speed.

Flat files work OK if they are very small. A perl script can read the whole file into memory in a split second and then look through it to pull out the information requested. But suppose that your on-line bank grows to have 250,000 accounts. A user types his account number into a Web page and asks for his most recent deposits. You've got a chronological financial transactions file with 25 million entries. Crunch, crunch, crunch. Your server laboriously works through all 25 million to find the ones with an account number that matches the user's. While it is crunching, 25 other users come to the Web site and ask for the same information about their accounts.

You have two choices: (1) buy an 8-headed DEC Alpha with 2 GB of RAM; (2) build an index file. If you build an index file that maps account numbers to sequential transaction numbers, then your server won't have to searching all 25 million records anymore. However, you have to modify all of your programs that insert, update, or delete from the database to also keep the index current.

This works great until two years later when a brand new MBA arrives from Harvard. She asks your English major cum perl hacker, "now I'd like a report of all customers who have more than $5,000 in checking or live in Oklahoma and have withdrawn more than $100 from savings in the last 17 days." It turns out that you didn't anticipate this query so your indexing scheme doesn't speed things up. Your server has to grind through all the data over and over again.

Enter the Relational Database

You are a Web publisher. On the cutting edge. You need the latest and greatest in computer technology. That's why you use, uh, Unix. Yeah. Anyway, even if your operating system harks back to the 1960s, you definitely can't live without the most modern database management system available. Maybe this guy E.F. Codd can help:
"Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). ... Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information.

"Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model."

Sounds pretty spiffy, doesn't it? Just like what you need. That's the abstract to A Relational Model of Data for Large Shared Data Banks, a paper Codd wrote while working at IBM's San Jose research lab. It was published in the Communications of the ACM in June, 1970.

Yes, that's right, 1970. What you need to do is move your Web site into the 70s with one of these newfangled relational database management systems (RDBMS). Actually, as Codd notes in his paper, most of the problems we've encountered so far in this chapter were solved in the 1960s by off-the-shelf mainframe software sold by IBM and the "seven dwarves" (as IBM's competitors were known). In the early 1960s, businesses had gotten tired of losing important transactions and manually uncorrupting databases. They began to think that their applications programmers shouldn't be implementing transactions an indexing on an ad hoc basis for each new project. These systems worked fairly well but resulted in brittle data models. If you got your data representation correct the first time and your business needs never changed then a 1967-style hierarchical database was great. Unfortunately, if you put a system in place and subsequently needed new indices or a new data format then you might have to rewrite all of your applications programs.

From an application programmer's point of view, the big innovation in the relational database is that one uses a declarative query language, SQL, an acronym for Structured Query Language and pronounced "ess-cue-el" or "sequel". Most computer languages are procedural. The programmer tells the computer what to do, step by step, specifying a procedure. In SQL, the programmer says "I want data that meets the following criteria" and the RDBMS query planner figures out how to get it. There are two advantages to using a declarative language. The first is that the queries no longer depend on the data representation. The RDBMS is free to store the data however it wants. The second is increased software reliability. It is much harder to have "a little bug" in an SQL query than in a procedural program. Generally it either describes the data that you want and works all the time or it completely fails in an obvious way.

Another benefit of declarative languages is that less sophisticated users are able to write useful programs. For example, many computing tasks that required professional programmers in the 1960s can be accomplished by non-technical people with spreadsheets. In a spreadsheet, you don't tell the computer how to work out the numbers or in what sequence. You just declare "this cell will be 1.5 times the value of that other cell over there."

RDBMSes can run very very slowly. Depending on whether you are selling or buying computers, this may upset or delight you. Suppose that the system takes 30 seconds to return the data you asked for in your query. Does that mean you have a lot of data? That you need to add some indices? That the RDBMS query planning made some bad choices and needs some hints? Who knows? The RDBMS is an enormously complicated program that you didn't write and for which you don't have the source code. Each vendor has tracing and debugging tools that purport to help you, but the process is not simple. Good luck figuring out a different SQL incantations that will return the same set of data in less time. If you can't, call 1-800-DIGITAL and say "I'd like an 8-headed DEC Alpha with 2 GB of RAM". Alternatively, you could keep running the non-relational software you used in the 1960s, which is what the airlines do for their reservations systems.

How Does This RDBMS Thing Work?

Database researchers love to talk about relational algebra, n-tuples, normal form, natural composition, and throw around mathematical symbols. This patina of mathematical obscurity tends to distract your attention from their bad suits and boring personalities, but is of no value if you just want to use a relational database management system.

In fact, this is all you need to know to be a Caveman Database Programmer: a relational database is a big spreadsheet that several people can update simultaneously.

Each table in the database is one spreadsheet. You tell the RDBMS how many columns each row has. For example, in our mailing list database, the table would have two columns: "name" and "email". Each entry in the database consists of one row in this table. All the data in a column must be of the same type, e.g., integer, decimal, character string, date. One difference between a spreadsheet and an RDBMS is that the rows in an RDBMS are not ordered. You can have a column named "row_number" and ask the RDBMS to return the rows ordered according to the data in this column. But the row numbering is not implicit as it would be with Visicalc or its derivatives such as Lotus 1-2-3 and Excel. If there is a "row_number" column or some other unique identifier for rows in a table, then it is possible for a row in another table to refer to that row by including the value of the unique id.

Here's what some SQL looks like for the mailing list application:



create table mailing_list (
	email		text not null primary key,
	name		text
);

The table will be called "mailing_list" and will have two columns, both variable length character streams. We've added a couple of integrity constraints on the "email" column. The "not null" will prevent any program from inserting a row where name is specified but email is not. After all, the whole point of the system is to send people email so it isn't much use to have a name with no email address. The "primary key" tells the database that this column's value can be used to uniquely identify a row. That means that the system will reject an attempt to insert a row with the same email address as an existing row. This sounds like a nice feature, but it can have some unexpected performance implications. For example, every time anyone tries to insert a row into this table, the RDBMS will have to look at all the other rows in the table to make sure that there isn't already one with the same email address. For a really huge table, that could take minutes. If you had also asked the RDBMS to create an index for "mailing_list" on "email" then the check becomes almost instantaneous. However, the integrity constraint still slows you down because every update to the mailing_list table will also require an update to the index and there you'll be doing twice as many writes to the hard disk.

That is the joy and the agony of SQL. Inserting two innocuous looking words can cost you a factor of 1000 in performance. Then inserting a sentence (to create the index) can bring you back so that it is only a factor of 2 or 3.

Anyway, now that we've executed the Data Definition Language "create table" statement, we can move onto Data Manipulation Language: an insert.


insert into mailing_list (name, email)
values ('Philip Greenspun','philg@mit.edu');
Note that we specify into which columns we are inserting. That way if someone comes along later and does

alter table mailing_list add column phone_number text;
then our insert will still work. Note also that the string quoting character in SQL is a single quote. Hey, it was the 70s. Anyway, if you want to avoid humiliating yourself on USENET, remember that doubling the single quote will let you insert one into the database...

insert into mailing_list (name, email)
values ('Michael O''Grady','ogrady@fastbuck.com');
So you've done DDL and DML statements. At last you are ready to experience the awesome power of the third type of SQL statement: a SELECT. Want your data back?

select * from mailing_list;
If you were to type this query into a standard shell-style RDBMS client program, e.g., SQL*PLUS for Oracle or, in this case, MSQL for Illustra, here's what you'd get.

* select * from mailing_list;
-------------------------------------------
|email               |name                |
-------------------------------------------
|philg@mit.edu       |Philip Greenspun    |
|ogrady@fastbuck.com |Michael O'Grady     |
-------------------------------------------
2 rows selected
Now you realize that you want to also store phone numbers for these folks. Since this is the Internet, many of your users will have their pants hanging around their knees under the weight of their cell phones, beepers and other personal communication accessories. So if you just added work_phone and home_phone columns, you wouldn't be able to accomodate the wealth of information users might want to give you.

The clean database-y way to do this is to define another table.


create table phone_numbers (
	email		text not null,
	number_type	text check (number_type in ('work','home','cell','beeper')),
	number		text
);
Note that in this table the email column is not a primary key. That's because we want to allow multiple rows with the same email address. Note also that we do not store the user's name here. That would be redundant with the mailing_list table and potentially self-redundant as well, e.g., if "robert.loser@fastbuck.com" says he is "Robert Loser" when he types in his work phone and then "Rob Loser" when he puts in his beeper number, and "Bob Lsr" when he puts in his cell phone number while typing on his laptop's cramped keyboard.

insert into phone_numbers values ('ogrady@fastbuck.com','work','(800) 555-1212');
insert into phone_numbers values ('ogrady@fastbuck.com','home','(617) 495-6000');
insert into phone_numbers values ('philg@mit.edu','work','(617) 253-8574');
insert into phone_numbers values ('ogrady@fastbuck.com','beper','(617) 222-3456');
Note that we have used an evil SQL shortcut and not specified the columns into which we are inserting data. The system defaults to using all the columns in the order that they were defined. Except for prototyping and playing around, I don't recommend ever using this shortcut.

The first three inserts work fine, but what about the last one, where Mr. O'Grady misspelled "beeper"?


X23C00:integrity constraint violation: 
  Constraint Name _CHECK_CONSTRAINT_ON_INSERT_TO_public_phone_numbers_
We asked Illustra at table definition time to check (number_type in ('work','home','cell','beeper')) and it did. The database cannot be left in an inconsistent state.

Let's say you want all of our data out. Email, full name, phone numbers. The most obvious query to try is a join.


select * 
from mailing_list, phone_numbers;
---------------------------------------------------------------------------------
|email              |number_type  |number        |email        |name            |
---------------------------------------------------------------------------------
|ogrady@fastbuck.com|work         |(800) 555-1212|philg@mit.edu|Philip Greenspun|
|ogrady@fastbuck.com|home         |(617) 495-6000|philg@mit.edu|Philip Greenspun|
|philg@mit.edu      |work         |(617) 253-8574|philg@mit.edu|Philip Greenspun|
|ogrady@fastbuck.com|work         |(800) 555-1212|ogrady@fastb*|Michael O'Grady |
|ogrady@fastbuck.com|home         |(617) 495-6000|ogrady@fastb*|Michael O'Grady |
|philg@mit.edu      |work         |(617) 253-8574|ogrady@fastb*|Michael O'Grady |
---------------------------------------------------------------------------------
6 rows selected
Yow! What happened? There are only two rows in the mailing_list table and three in the phone_numbers table. Yet here we have six rows back. This is how joins work. They give you the Cartesian product of the two tables. Each row of one table is paired with all the rows of the other table in turn. So if you join an N-row table with an M-row table, you get back a result with N*M rows. In real databases, N and M can be up in the millions so it is worth being a little more specific as to which rows you want:

select * 
from mailing_list, phone_numbers
where mailing_list.email = phone_numbers.email;
---------------------------------------------------------------------------
|email        |number_type  |number        |email        |name            |
---------------------------------------------------------------------------
|philg@mit.edu|work         |(617) 253-8574|philg@mit.edu|Philip Greenspun|
|ogrady@fastb*|work         |(800) 555-1212|ogrady@fastb*|Michael O'Grady |
|ogrady@fastb*|home         |(617) 495-6000|ogrady@fastb*|Michael O'Grady |
---------------------------------------------------------------------------
3 rows selected
Probably more like what you had in mind. Refining your SQL statements in this manner can sometimes be more exciting. For example, let's say that you want to get rid of Philip Greenspun's phone numbers but aren't sure of the exact syntax.

delete from phone_numbers;
3 rows deleted
Oops. Yes, this does actually delete all the rows in the table. You probably wish you'd typed

delete from phone_numbers where email = 'philg@mit.edu';
but it is too late now. I guess there is one more SQL statement that is worth learning. Suppose that I move to Hollywood to realize my long-standing dream of becoming a major motion picture producer. Clearly a change of name is in order, though, I'd be reluctant to give up the email address I've had since 1976. Here's the SQL:

* update mailing_list set name = 'Phil-baby Greenspun' where email = 'philg@mit.edu';

one row updated

* select * from mailing_list;
-------------------------------------
|email              |name           |
-------------------------------------
|ogrady@fastbuck.com|Michael O'Grady|
|philg@mit.edu      |Phil-baby Gree*|
-------------------------------------
2 rows selected
As with DELETE, I don't recommend playing around with UPDATE statements unless you have a WHERE clause at the end.

SQL the Hard Way

So you've gotten your pathetic Web site up and running and are proud of yourself for your wimpy little SELECTs. You had planned to live on the advertising revenue from your site, but find that $1.37/month doesn't go very far in Manhattan. So you say "At least I've become a database wizard; I can work in the financial industry." The next day, you're interviewing for that $200,000/year database hacking job at CitiCorp. Everything is going smoothly until they ask you how you'd investigate a performance problem with a "self-join with subquery". Self-join? Subquery? Maybe you have a little more to learn about RDBMS. Here's an example drawn from my real-life suffering with Illustra.

My site would hand out a unique session key to every new user who arrived at the site. These keys were just an integer sequence. I realized that I could keep track of how many users came to the site by simply inserting an audit row every day showing the value of the session key generator. Here's the history table I created:


* create table history (
	sample_time	timestamp,	-- when did the cookie have the value
	sample_value	integer
);

* select * from history order by sample_time;
------------------------------------------
|sample_time               |sample_value |
------------------------------------------
|1996-02-15 19:02:08.000000|75000        |
|1996-02-16 19:02:08.000000|76000        |
|1996-02-17 19:02:08.000000|77000        |
|1996-02-18 19:02:08.347617|77276        |
------------------------------------------

I knew that I had the information I needed and waited a few days to write the trivial NaviServer Tcl script to barf out a report page. It turns out to be easy to extract the rows of a table in order, as in the last SELECT above. However, it is impossible for a row to refer to "the last row in this SELECT". I could have written a Tcl procedure to walk through the rows in order, just setting things up on the first pass through the loop and then doing the appropriate subtraction for subsequent rows. However, Tcl doesn't have primitives that understand SQL timestamps. If I'd been using Oracle, I could have written this in PL/SQL, a language with all the procedural expressiveness of Tcl plus all the datatypes and functions of SQL. But I was using Illustra so I had to resort to classical SQL techniques.

If the history had N rows, I needed an interval table with N-1 rows. Each row would have the start time, end time, time interval, and cookie interval. Since I needed information from two different rows in the database, the most basic way to get it was with a JOIN. Since there was only one table, though, this would have to be a self-join.

* select h1.sample_time, h2.sample_time
from history h1, history h2;
-------------------------------------------------------
|sample_time               |sample_time               |
-------------------------------------------------------
|1996-02-15 19:02:08.000000|1996-02-15 19:02:08.000000|
|1996-02-16 19:02:08.000000|1996-02-15 19:02:08.000000|
|1996-02-17 19:02:08.000000|1996-02-15 19:02:08.000000|
|1996-02-18 19:02:08.347617|1996-02-15 19:02:08.000000|
|1996-02-15 19:02:08.000000|1996-02-16 19:02:08.000000|
|1996-02-16 19:02:08.000000|1996-02-16 19:02:08.000000|
|1996-02-17 19:02:08.000000|1996-02-16 19:02:08.000000|
|1996-02-18 19:02:08.347617|1996-02-16 19:02:08.000000|
|1996-02-15 19:02:08.000000|1996-02-17 19:02:08.000000|
|1996-02-16 19:02:08.000000|1996-02-17 19:02:08.000000|
|1996-02-17 19:02:08.000000|1996-02-17 19:02:08.000000|
|1996-02-18 19:02:08.347617|1996-02-17 19:02:08.000000|
|1996-02-15 19:02:08.000000|1996-02-18 19:02:08.347617|
|1996-02-16 19:02:08.000000|1996-02-18 19:02:08.347617|
|1996-02-17 19:02:08.000000|1996-02-18 19:02:08.347617|
|1996-02-18 19:02:08.347617|1996-02-18 19:02:08.347617|
-------------------------------------------------------
16 rows selected

A note about syntax is in order here. In an SQL FROM list, one can assign a correlation name to a table. In this case, I assign h1 and h2 to the two copies of history from which I am selecting. Then I can refer to h1.sample_time and get "the sample_time column from the first copy of the history table."

The main problem with this query, though, has nothing to do with syntax. It is the fact that I have 13 rows too many. Instead of N-1 rows, I specified the Cartesian product and got NxN rows. I've successfully done a self-join and gotten all the pairings I need, but now I must specify which pairings are legal.


* select h1.sample_time as s1, 
       h2.sample_time as s2
from history h1, history h2
where s2 > s1;
-------------------------------------------------------
|s1                        |s2                        |
-------------------------------------------------------
|1996-02-15 19:02:08.000000|1996-02-16 19:02:08.000000|
|1996-02-15 19:02:08.000000|1996-02-17 19:02:08.000000|
|1996-02-16 19:02:08.000000|1996-02-17 19:02:08.000000|
|1996-02-15 19:02:08.000000|1996-02-18 19:02:08.347617|
|1996-02-16 19:02:08.000000|1996-02-18 19:02:08.347617|
|1996-02-17 19:02:08.000000|1996-02-18 19:02:08.347617|
-------------------------------------------------------
6 rows selected
Note first that I've given correlation names to the columns as well. Not only is my report be labelled with "s1" and "s2" but I also use these as shorthand in the WHERE clause, which states that we only want intervals where s2 is later than s1. That kills off 10 of the rows from the Cartesian product but there are still three unwanted rows, e.g., the pairing of 1996-02-15 and 1996-02-18. I only want the pairing of 1996-02-15 and 1996-02-16. I can specify that with another WHERE clause:

select h1.sample_time as s1, 
       h2.sample_time as s2
from history h1, history h2
where s2 > s1
and s2 = (select min(h3.sample_time) 
          from history h3
          where h3.sample_time > h1.sample_time)
order by s1;

-------------------------------------------------------
|s1                        |s2                        |
-------------------------------------------------------
|1996-02-15 19:02:08.000000|1996-02-16 19:02:08.000000|
|1996-02-16 19:02:08.000000|1996-02-17 19:02:08.000000|
|1996-02-17 19:02:08.000000|1996-02-18 19:02:08.347617|
-------------------------------------------------------
3 rows selected
Note that I am now asking the database, for each of the 6 rows, to do a subquery:

select min(h3.sample_time) 
from history h3
where h3.sample_time > h1.sample_time
This will scan the history table yet again to find the find the oldest sample that is still newer than s1. In the case of an unindexed history table, this query should probably take an amount of time proportional to then number of rows in the table cubed (N^3). If we'd done this procedurally, it would have taken time proportional to N*log(N) (the limiting factor being the sort for the ORDER BY clause). There are a couple of lessons to be learned here: (1) sometimes declarative languages can be difficult to use and vastly more inefficient than procedural languages; (2) it is good to have a fast database server.

When I knew that I had the rows that I wanted, I added the trivial syntax to the SELECT list to subtract the times and cookie values.


select h1.sample_time as s1, 
       h2.sample_time as s2, 
       h2.sample_time - h1.sample_time as gap_time,
       h2.sample_value - h1.sample_value as gap_cookie
from history h1, history h2
where s2 > s1
and s2 = (select min(h3.sample_time) 
          from history h3
          where h3.sample_time > h1.sample_time)
order by s1;
-----------------------------------------------------------------------------------
|s1                        |s2                        |gap_time     |gap_cookie   |
-----------------------------------------------------------------------------------
|1996-02-15 19:02:08.000000|1996-02-16 19:02:08.000000|0-0          |1000         |
|1996-02-16 19:02:08.000000|1996-02-17 19:02:08.000000|0-0          |1000         |
|1996-02-17 19:02:08.000000|1996-02-18 19:02:08.347617|0-0          |276          |
-----------------------------------------------------------------------------------
3 rows selected
Illustra had decided that my time intervals were "0 years and 0 months" in length. Yes, thanks, that was just what I wanted to know. I spent a few hours trying various incantations and casts and spells but ultimately gave up. Nobody at Illustra support was ever able to make this query work either.

So before you apply for that $200,000/year database job, remember that formulating SQL queries can be an art, you'll need time and experience to get good at thinking declaratively, and RDBMSes do not always work as advertised.

Brave New World

Training an African Grey parrot to function as an information systems manager can be very rewarding. The key sentence is "We're pro-actively leveraging our object-oriented client/server database to target customer service during reengineering." In the 1980s db world, the applicable portion of this sentence was "client/server" (see next chapter). In the Brave New World of database management systems, the key phrase is "object-oriented."

Object systems contribute to software reliability and compactness by allowing programmers to factor their code into chunks that are used as widely as possible. For example, suppose that you are building a catalog Web site to sell magazines, videos, books, and CDs. It might be worth thinking about the data and functions that are common to all of these and encapsulating them in a product class. At the product level, you'd define characteristics such as product_id, short_name, description. Then you'd define a magazine subclass that inherited all the behavior of product and added things like issues_per_year.

Programmers using modern computer languages like Smalltalk and Lisp have been doing this since the mid-1970s but the idea has only recently caught on in the RDMBS world. Here are some table definitions for the Illustra system:

create table products of new type product_t
( 
	product_id		integer not null primary key,
	short_name		text not null,
	description		text
);
Here's how we define new types and tables that inherit from products...
create table magazines of new type magazine_t (
	issues		integer not null,
	foreign_postage decimal(7,2),
	canadian_postage decimal(7,2)
)
under products;

create table videos of new type video_t (
	length_in_minutes	integer
)
under products;
Our data model is defined, so we can load some data.
* insert into magazines (product_id,short_name,description,issues) 
values (0,'Dissentary','The result of merging Dissent and Commentary',12);
* insert into videos (product_id,short_name,description,length_in_minutes) 
values (1,'Sense and Sensibility','Chicks dig it',110);
* select * from products;
---------------------------------------------------
|product_id   |short_name           |description  |
---------------------------------------------------
|1            |Sense and Sensibility|Chicks dig it|
|0            |Dissentary           |The result o*|
---------------------------------------------------
Suppose that our pricing model is that magazines cost $1.50/issue and videos cost $0.25/minute. We want to hide these decisions from programs using the data.
create function find_price(product_t) returns numeric with (late) 
as 
return 5.50;
So a generic product will cost $5.50.
create function find_price(magazine_t) returns numeric
as 
return $1.issues * 1.50;
create function find_price(video_t) returns numeric
as 
return  $1.length_in_minutes * 0.25;
The appropriate version of the function find_price will be invoked depending on the type of the row.
* select short_name, find_price(products) from products;
---------------------------------------
|short_name           |find_price     |
---------------------------------------
|Sense and Sensibility|          27.50|
|Dissentary           |          18.00|
---------------------------------------
This doesn't sound so impressive, but suppose you also wanted a function to prepare a special order code by concatenating product_id, price, and the first 5 characters of the title.
create function order_code(product_t) returns text
as
return $1.product_id::text || 
       '--' || 
       trim(leading from find_price($1)::text) || 
       '--' || 
       substring($1.short_name from 1 for 5);

* select order_code(products) from products;
-----------------
|order_code     |
-----------------
|1--27.50--Sense|
|0--18.00--Disse|
-----------------
This function, though trivial, is already plenty ugly. The fact that the find_price function dispatches according to the type of its argument allows a single order_code to be used for all products.

The Brave New World sounds great in DBMS vendor brochures, but the database folks have only recently gotten the object-oriented religion and they never met Dave Moon. Who is Dave Moon? One of the world's best programmers, a pioneer in modern object systems, but alas not a patient man in his youth.

Moon was one of the chief architects of the MIT Lisp Machine, the world's easiest to program computer. It did things in 1978 that, if we are lucky, will be announced as innovations by Microsoft in the year 2005. A tiny company called Symbolics was spun out of MIT to commercialize the Lisp Machine and Moon was one of the founders. I was working there in 1984 when the company moved into a new building next to MIT. The facilities manager sent around some email telling people not to tape posters to their office walls because we'd be moving to bigger quarters in a few years and didn't want the landlord to charge us for excessive wear. That night, the Foonly crashed. A Foonly was a clone of the PDP-10, a mainframe computer designed by Digital in the 1960s. MIT and Stanford people loved the PDP-10 but couldn't afford DEC's million dollar price tags. So there were these guys in a basement in California smoking dope and wirewrapping clones that were 1/3 the speed and 1/20th the cost. Nobody ever figured out why they called the machines "Foonlies".

Moon was a superb hardware engineer and nobody doubted that he would get the Foonly up and running. Still, people were a bit surprised when a huge steel cylinder came crashing through the machine room wall. The cause of the crash had been one of those washing-machine-sized Control Data T-300 disk packs. The cylindrical missile had been the spindle holding together the bad 12" platters. Moon had hurled it through the wall after determining its guilt in the crime of the Foonly crash. I went back to my office and taped up a poster.

This story illustrates that great programmers are not necessarily patient. One of the things that drove them crazy about the object systems of the 1970s (Smalltalk, Lisp Machine Flavors) was that if you changed a class definition, the existing instances of that class did not get modified. You'd have to restart your program, maybe even reboot your computer, if you changed your mind about how to represent something. You could lose 20 minutes or even more.

Thus the object systems of the 1980s, e.g., Common Lisp Object System, were designed to touch up running instances of classes if the class definition changed. At press-time, the only non-vaporware object-flavored relational database is Illustra 3.2. I built myself a beautiful table hierarchy more or less like what I've described above. Then six months later I needed to add a column to the products table. E.F. Codd understood back in 1970 that data models had to grow as business needs change. But the Illustra folks were so excited by their object extensions that they forgot. The system couldn't add a column to a table with dependent subclasses. What should I do, I asked the support folks? "Dump the data out of all of your tables, drop all of them, rebuild them with the added column, then load all of your data back into your tables." Uh thanks, I guess I could be back on-line by 1998...

I had high hopes for the merged Informix/Illustra system. A year had passed since I'd pointed out that table inheritance was useless because of the brittleness of the resulting data models. So I figured surely they would have plugged this hole in the feature set. But it turned out that the new "solution to all of your problems" server has the same limitation. I guess in the end it is much easier to hype 21st century features than actually sit down and implement features from two decades back.

Braver New World

Oracle Release 8 is supposed to have some object extensions but aside from "will be released sometime in 1997", the company isn't saying much about the product. In my experience, the Oracle system is very good about letting you change your data model, much better than Illustra. For example, you can change a column's type, add constraints, and drop a column. So here's hoping that they will deliver the benefits of table definition inheritance without locking them into their original data model forever.

If you really want to be on the cutting edge, you can use a bona fide object database, like Object Design's ObjectStore. These persistently store the sorts of object and pointer structures that you create in a Smalltalk, Common Lisp, C++, or Java program. Chasing pointers and certain kinds of transactions can be 10 to 100 times faster than in a relational database. If you believed everything in the object database vendors' literature, then you'd be surprised to see Larry Ellison roaring past in his Acura NSX. Oracle should have been crushed long ago under the weight of this superior technology, introduced with tremendous hype in the mid-1980s. Ellison ought to be back teaching high school science.

After 10 years, the market for object database management systems is about $100 million/year, 1/50th the size of the relational database market. Object databases bring back some of the bad features of 1960s pre-relational database management systems. The programmer has to know a lot about the details of data storage. If you know the identities of the objects you're interested in, then the query is fast and simple. But it turns out that most database users don't care about object identities; they care about object attributes. Relational databases tend to be faster and better at coughing up aggregations based on attributes. That said, I'm surprised that object databases aren't more popular. My personal theory is that the ascendance of C++ is responsible for the floundering of object databases. Anyone intelligent can quickly write a highly reliable program in Smalltalk or Common Lisp. But the world embraced C++, a language in which almost nobody has ever managed to write a reliable program. Corporations tend to be conservative about databases, which can be their most valuable assets. They never developed enough trustworthy C++ applications to make an object database worth buying and hence they continued to program in SQL.

Java to some extent restores programmers to where they were in 1978 with their Xerox Smalltalk environment or MIT Lisp Machine. Since Java seems to have enough money behind it to catch on and object databases are very naturally suited to backing up Java programs, I predict an increase in object database popularity.

Choosing an RDBMS Vendor

All the RDBMS vendors claim to understand exactly how a winning Web site should be built. You give them your money and they'll tell you what time it is on the Web. It all sounds really plausible until you look at the slow, content-free Web sites they've built for themselves. None of the Web crawlers were built by any of the database vendors and, so far as I know, none of the Web crawlers even use any software made by the RDBMS vendors. As of 1996, five years after the Web was established, only Sybase had figured out how to put their documentation on the Web.

What do you need out of an RDBMS? Although it isn't theoretically required that the SQL language come packaged with a relational database management system, it turns out that all commercial systems include SQL. SQL is more or less standard so that you can mechanically port 95% of your code from, say, Oracle to Informix to Sybase. I think that what really distinguishes a database for use behind a collaborative Web site is its ability to build a full-text index on the strings stored in the database. This is not part of standard SQL.

Suppose that a user says he wants to find out information on "dogs". If you had a bunch of strings in the database, you'd have to search them with a query like


select * from magazines where description like '%dogs%';
This requires the RDBMS to read every row in the table, which is slow. Also, this won't turn up magazines whose description includes the word "dog". Finally, the modern text search engines are very smart about how words related. So they might deliver a document that did not contain the word "dog" but did contain "Golden Retriever". This makes services like classified ads, discussion forums, etc., much more useful to users.

Unfortunately, as far as I know, the only relational database management systems that incorporate full text search engines are Oracle with its Context option and Illustra/Informix with its PLS and Verity Blades. This is the feature that makes them really worth looking at compared to the 211 other vendors listed in Yahoo.

Paying an RDBMS Vendor

This is the part that hurts. The basic pricing strategy of database management system vendors is to hang the user up by his heels, see how much money falls out, take it all and then ask for another $50,000 for "support". Ideally, they'd like to know how much your data is worth and how much profit you expect to make from making it available and then extract all of that profit from you. In this respect, they behave like the classical price-discriminating profit-maximizing monopoly from Microeconomics 101.

You can verify this by trying to get a price from an RDBMS vendor. Go ahead. You won't find prices on their Web sites. You have to talk to a salesman who tries to figure out how much you're good for. Classically, they've priced these things per user. If you had a big company with 100 operators then you paid more than a small company with 5 operators. Oracle has 70% of the market and tends to be the leader in terms of setting prices. Suppose that you are Joe Smalltime with a Pentium box in your basement. You have a site with 17 hits/day. After a couple of hours with the Oracle Web site and 800 number, you will eventually be told that, since they can't tell how many users you have at any one time, you need an "unlimited users single processor license". That will be $64,000 please. Plus another $60,000 or so for ConText. However, if they see you walking away from the sale and getting into your beat-up Toyota and driving away, then they'll decide that $16,000 is better than nothing so they'll allow you informally to run with a minimum 8-user license. I've gone through this with a couple of friends of mine.

Microsoft is the only vendor that actually has a price. You can go to their Web site and find a price for SQL Server. The explain clearly that if you are using it for the Internet, they want you to buy an Internet Connector for $3000 plus pay another $1400 for the database (prices on October 3,1996). I'm not sure if this is the best deal in the world, but at least you can find the price in less than two days.

Performance

According to their sales staff, Informix has "by far the fastest and most scalable core database engine in the world." This has an oddly familiar ring to it if you've just heard the Oracle guy note that "everyone agrees that the Oracle core code is the world's fastest and most scalable on multiprocessor machines." And if you go to the Sybase web site, you'll see their "Sybase System 11 has proved to be the leader in performance across a wide variety of hardware platforms." I suspect that somebody is not telling the truth.

Be assured that any RDBMS product will be plenty slow. I couldn't believe that my Illustra system was capable of only about 7 inserts/second into a three-column indexed table. This was on a 70 MIPS machine. Ten million instructions/insert! So I tried it with the latest and greatest Oracle 7.3 system. It could do 30/second.

There are several ways to achieve very high performance. One is to buy a huge multi-processor computer with enough RAM to hold your entire data model at once. Unfortunately, unless you decide to become a Microsoft SQL Server Achiever, your RDBMS vendor will give your bank account a reaming that it will not soon forget. The license fee will be four times as much for a four CPU machine as for a one CPU machine. When the basic fee is $60-100,000, multiplying by four can result in a truly staggering figure. Thus it might be best to try to get hold of the fastest possible single CPU computer.

If you are processing a lot of transactions, your bottleneck will be disk spindle contention anyway. The solution to this is to chant "Oh what a friend I have in Seagate." Disks are slow. Very slow. Literally almost one million times slower than the computer. Thus the computer spends a lot of time waiting for the disk(s). You can speed up SQL SELECTs simply by buying so much RAM that the entire database is in memory. However, the Durability requirement in the ACID test for transactions means that some record of a transaction will have to be written to a medium that isn't erased in the event of a power failure. If a disk can only do 100 seeks/second and you only have one disk, your RDBMS is going to be hard pressed to do more than about 100 updates/second.

The first thing you do is mirror all of your disks. If you don't have the entire database in RAM, this speeds up SELECTs because the disk controller can read from whichever disk is closer to the desired track. The opposite effect can be achieved if you use "RAID level 5" where data is striped across multiple disks. Then the RDBMS has to wait for four disks to seek before it can cough up a single row. So mirroring, or "RAID level 0", is what you want.

Once fully mirrored, you don't have to worry about media failure. If a disk dies, you just plug in a spare and experience zero off-the-Web time. However, you may still want snapshots of your database in case someone gets a little excited with a DELETE FROM statement. Or in case your facility is torched by a distraught user. The way the Oracle studs at Boston Children's Hospital do this with their 60 GB database is to break the mirror, back up from the disks that are off-line as far as the database is concerned, and then reestablish the mirror. For both performance and recovery reasons, they presumably keep their redo log on a separate disk from the rest of the database so they probably don't even risk losing the transactions that occur when the mirror is broken.

All you have to decide now is "How many disks?" The Oracle DBA Handbook recommends a 7x2 disk configuration as a minimum compromise for a machine doing nothing but database service. Their solutions start at 9x2 disks and go up to 22x2. The idea is to keep files that might be written in parallel on separate disks so that one can do 2200 seeks/second instead of 100.

Even if you have lots of disks, you have to be very thoughtful about how you lay your data out across them. Oracle's installation procedure forces you to think about where your data files should go. On a computer with one disk, this is merely annoying and keeps you from doing development. But the flexibility is there because you know which of your data areas tend to be accessed simultaneously and the computer doesn't. So if you do have a proper database server with a rack of disk drives, an intelligent manual layout can result in a factor of 5 in increased performance.

Don't forget to back up

Be afraid. Be very afraid. Standard Unix or Windows NT file system backups will not leave you with a consistent and therefore restoreable database on tape. Suppose that your RDBMS is storing your database in two separate Unix filesystem files, foo.db and bar.db. Each of these files is 200 MB in size. You start your backup program running and it writes the file foo.db to tape. As the backup is proceeding, a transaction comes in that requires changes to foo.db and bar.db. The RDBMS makes those changes, but the ones to foo.db occur to a portion of the file that has already been written out to tape. Eventually the backup program gets around to writing bar.db to tape and it writes the new version with the change. Your system administrator arrives at 9 am and sends the tapes via courier to an off-site storage facility.

At noon, an ugly mob of users assembles outside your office, angered by your introduction of frames and failure to include WIDTH and HEIGHT tags on IMGs. You send one of your graphic designers out to explain how "cool" it looked when run off a local disk in a demo to the Vice-President. The mob stones him to death and then burns your server farm to the ground. You manage to pry your way out of the rubble with one of those indestructible HP Unix box keyboards. You manage to get the HP disaster support people to let you use their machines for awhile and confidently load your backup tape. To your horror, the RDBMS chokes up blood following the restore. It turned out that there were linked data structures in foo.db and bar.db. Half of the data structures (the ones from foo.db) are the "old pre-transaction version" and half are the "new post-transaction version" (the ones from bar.db). One transaction occurring during your backup has resulted in a complete loss of availability for all of your data. Maybe you think that isn't the world's most robust RDBMS design but there is nothing in the SQL standard or manufacturer's documentation that says Oracle, Sybase, or Informix can't work this way.

There are two ways to back up a relational database. You can shut it down, thus preventing transactions from occurring. Then back up the Unix or NT filesystem files (not usually recommended by the vendors but it tends to work) or use a simple utility provided by the vendor to make a dump file. This is called an "off-line backup" and is typically used by insurance companies and other big database users who only need to do transactions 8 hours/day. The Children's Hospital mirror-breaking example that I mentioned earlier in this chapter is another way of accomplishing the same thing.

Each RDBMS vendor has an advertised way of doing on-line backups. It can be as simple as "call this function and we'll grind away for a couple of hours building you a dump file that contains a consistent database but minus all the transactions that occurred after you called the function." That's what the Illustra 2.4 documentation said. But when I tested it by restoring from the dump file, a table that had been the subject of 10 transactions during the dump came out broken. The dump file was not consistent. It took a year of wrangling with tech support, the testing of about 10 "fixed binaries", and a new major release of the software (3.2) before the on-line backups would restore into legal databases.

The lessons here are several. First, whatever your backup procedure, make sure you test it with periodic restores. Second, remember that the backup and maintenance of an RDBMS is done by a full-time staffer at most companies, called "the dba", short for "database administrator". If the software worked as advertised, you could expect a few days of pain during the install and then periodic recurring pain to keep current with improved features. However, dba's earn their moderately lavish salaries. No amount of marketing hype suffices to make a C program work as advertised. That goes for an RDBMS just as much as for a word processor. So coming to terms with bugs, which mostly means finding workarounds since vendors are notoriously sluggish with fixes, can be a full-time job at a large installation. Another full-time job is hunting down users who are doing queries that are taking 1000 times longer than necessary because they forgot to build indices, etc. Children's Hospital has three full-time dbas and they work hard.

If all of this sounds rather tedious just to ensure that your data is still around tomorrow, then you might be cheered by the knowledge that Oracle dbas are always in high demand and start at $60,000 to $80,000/year. When the Web bubble bursts and your friends who are "HTML programmers" are singing in the subway, you'll be kicking back at some huge financial services firm.


Text and pictures copyright 1996 Philip Greenspun.
philg@mit.edu