Head of the Charles Regatta, Sunday, October 18, 1998.  From the footbridge to Harvard Business School

Appendix A: Setting up your own RDBMS

by Philip Greenspun, part of SQL for Web Nerds

This book was written for students at MIT who have access to our Web/db development systems. The second category of reader who could use this book painlessly is the corporate slave who works at Bloatco, Inc. where they have a professionally-maintained Oracle server. If you are unfortunate enough to fall outside of those categories, you might need to install and maintain your own RDBMS. This appendix is intended to help you choose and run an RDBMS that will be good for learning SQL and that will let you grow into running production Web sites.

Choosing an RDBMS Vendor (Quick and Dirty)

A fruit and flower market in central Stockholm The quick and dirty way to choose a database management system is to start from a list of products that seem viable in the long term. Basically you can choose from among three: If want to ignore the RDBMS and concentrate your energy on attacking higher-level application challenges, Oracle is the best choice. It is the most powerful and feature-rich RDBMS. You can run the same software on a $500 PC or in a $5 million multiply redundant server configuration.

PostgreSQL is an interesting alternative. It is free and open-source. Like Oracle, it has optimistic locking (writers need not wait for readers; readers need not wait for writers). PostgreSQL can be easier to install and maintain than Oracle. PostgreSQL was built from the ground up as an object-relational database and offers some important features that Oracle still lacks. Business folks who are more concerned with support, reliability, and the possibility of redundancy may question your choice of PostgreSQL, however. See www.postgresql.org for the latest on this rapidly evolving system.

Microsoft SQL Server is an uninteresting alternative. Microsoft started with the source code from Sybase and has gradually improved the product. The system can be problematic for Web use because of its traditional pessimistic locking architecture. If you hire a new programmer and he or she executes a slow-to-return query, users won't be able to update information, place orders, or make comments until the query completes. In theory the management of these locks can be manually adjusted but in practice Web programmers never have the time, ability, or inclination to manage locks properly. SQL Server is generally behind Oracle in terms of features, e.g., the ability to run Java inside the database, SQL extensions that are convenient for data warehousing, or layered products that help organizations with extreme performance or reliability demands. All of this said, SQL Server probably won't disappear because Microsoft has so much power in a large portion of the server world. So if you're part of an organization that is 100 percent Microsoft and people are already skilled at maintaining SQL Server, it is a reasonable technical decision to continue to use it.

Choosing an RDBMS Vendor (From First Principles)

Here are the factors that we think are important in choosing an RDBMS to sit behind a Web site:
  1. cost/complexity to administer
  2. lock management system
  3. full-text indexing option
  4. maximum length of VARCHAR data type
  5. ease of running standard programming languages internally
  6. support

Cost/Complexity to Administer

Stockholm viewed from Stadshuset Sloppy RDBMS administration is one of the most common causes of downtime at sophisticated sites. If you don't have an experienced staff of database administrators to devote to your site, you should consider either outsourcing database administration or running a simple RDBMS such as PostgreSQL.

Lock Management System

Guard at Royal Palace in Gamla Stan in central Stockholm Relational database management systems exist to support concurrent users. If you didn't have 100 people simultaneously updating information, you'd probably be better off with a Perl script than a commercial RDBMS (i.e., 100 MB of someone else's C code).

All database management systems handle concurrency problems with locks. Before an executing statement can modify some data, it must grab a lock. While this lock is held, no other simultaneously executing SQL statement can update the same data. In order to prevent another user from reading half-updated data, while this lock is held, no simultaneously executing SQL statement can even read the data.

Readers must wait for writers to finish writing. Writers must wait for readers to finish reading.

This kind of system is simple to implement, works great in the research lab, and can be proven correct mathematically. The only problem with it? It doesn't work. Sometimes it doesn't work because of a bug. A particular RDBMS's implementation of this scheme get confused and stuck when there are a bunch of users. More often it doesn't work because pessimistic locking is a bug. A programmer writes an hour-long back-end query and forgets that by doing so he or she will cause every updating page on the Web site to wait for the full hour.

With the Oracle RDBMS, readers never wait for writers and writers never wait for readers. If a SELECT starts reading at 9:01 and encounters a row that was updated (by another session) at 9:02, Oracle reaches into a rollback segment and digs up the pre-update value for the SELECT (this preserves the Isolation requirement of the ACID test). A transaction does not need to take locks unless it is modifying a table and, even then, only takes locks on the specific rows that are to be modified.

This is the kind of RDBMS locking architecture that you want for a Web site. Oracle and PostgreSQL offer it.

Full-text Indexing Option

Suppose that a user says he wants to find out information on "dogs". If you had a bunch of strings in the database, you'd have to search them with a query like
select * from magazines where description like '%dogs%';
A door in Skansen in Stockholm This requires the RDBMS to read every row in the table, which is slow. Also, this won't turn up magazines whose description includes the word "dog".

A full-text indexer builds a data structure (the index) on disk so that the RDBMS no longer has to scan the entire table to find rows containing a particular word or combination of words. The software is smart enough to be able to think in terms of word stems rather than words. So "running" and "run" or "dog" and "dogs" can be interchanged in queries. Full-text indexers are also generally able to score a user-entered phrase against a database table of documents for relevance so that you can query for the most relevant matches.

Finally, the modern text search engines are very smart about how words relate. So they might deliver a document that did not contain the word "dog" but did contain "Golden Retriever". This makes services like classified ads, discussion forums, etc., much more useful to users.

Relational database management system vendors are gradually incorporating full-text indexing into their products. Sadly, there is no standard for querying using this index. Thus, if you figure out how to query Oracle 8.1 with ConText for "rows relating to 'running' or its synonyms", the SQL syntax will not be useful for asking the same question of Microsoft SQL Server 7.0 with its corresponding full-text indexing option.

My best experiences have been with the Illustra/PLS combination. I fed it 500 short classified ads for photography equipment then asked "What word is most related to Nikon". The answer according to Illustra/PLS: Nikkor (Nikon's brand name for lenses).

Maximum Length of VARCHAR Data Type

Malibu, California. You might naively expect a relational database management system to provide abstraction for data storage. After defining a column to hold a character string, you'd expect to be able to give the DBMS a ten-character string or a million-character string and have each one stored as efficiently as possible.

In practice, current commercial systems are very bad at storing unexpectedly long data, e.g., Oracle only lets you have 4,000 characters in a VARCHAR. This is okay if you're building a corporate accounting system but bad for a public Web site. You can't be sure how long a user's classified ad or bulletin board posting is going to be. Modern database vendors typically provide a character large object (CLOB) data type. A CLOB theoretically allows you to store arbitrarily large data. However, in practice there are so many restrictions on a CLOB column that it isn't very useful. For example, with Oracle 8i you can't use a CLOB in a SQL WHERE clause and thus the preceding "LIKE '%dogs%'" would fail. You can't build a standard index on a LOB column. You may also have a hard time getting strings into or out of a LOB. The Oracle SQL parser only accepts string literals up to 4,000 characters in length. After that, you'll have to use special C API calls. LOBs will give your Oracle database administrator fits: they break the semantics of EXPORT and IMPORT. At least as of Oracle 8.1.6, if you export a database containing LOBs you won't be able to import it to another Oracle installation unless that installation happens to have a tablespace with the same name as the one where the LOBs were stored in the exported installation.

PostgreSQL has a "text" data type that theoretically has no limit. However, an entire PostgreSQL row must be no longer than 8,000 characters. So in practice PostgreSQL is less powerful than Oracle in this respect.

*** research Microsoft SQL Server but last I checked it was 255 characters! *****

Caveat emptor.

Ease of Running Standard Programming Languages Internally

Within Oracle it is fairly easy to run Java and the quasi-standard PL/SQL. Within PostgreSQL it is fairly easy to run Perl, Tcl, and a sort-of-PL/SQL-like PL/pgSQL. Within Microsoft SQL Server ***** (research this).

Support

In theory you won't be calling for support very often but you want to make sure that when you do it is to an organization that takes RDBMS reliability and uptime very seriously.

Paying an RDBMS Vendor

Big Save.  Hawaii "PostgreSQL is available without cost," is the opening to Chapter 1 of the PostgreSQL documentation. Microsoft has the second easiest-to-figure-out pricing: visit http://www.microsoft.com/sql/ and click on "pricing and licensing". The price in 1998 was $4400 for software that could be used on a 4-CPU machine sitting behind a Web site. As of September 2000 they were charging either $20,000 or $80,000 for a 4-CPU Web server, depending on whether you wanted "enterprise" or "standard" edition.

Despite its industrial heritage, Oracle can be much cheaper than Microsoft. Microsoft charges $500 for a crippled developer edition of SQL Server; Oracle lets developers download the real thing for free from technet.oracle.com. Microsoft wants $20,000 per CPU; Oracle negotiates the best deal that they can get but lately has been selling startups a "garage" license for $10,000 for two years.

Performance

Orangutan.  Audubon Zoo.  New Orleans, Louisiana. Be assured that any RDBMS product will be plenty slow. We once had 70,000 rows of data to insert into Oracle8. Each row contained six numbers. It turned out that the data wasn't in the most convenient format for importation so we wrote a one-line Perl script to reformat it. It took less than one second to read all 70,000 rows, reformat them, and write them back to disk in one file. Then we started inserting them into an Oracle 8 table from a custom C application. It took about 20 minutes (60 rows/second). By using SQL*Loader we probably could have approached 1000 rows/second but that still would have been 70 times slower than the Perl script. Providing application programmers with the ACID guarantees is always going to be slow.

There are several ways to achieve high performance. If most of your activity is queries, you could start by buying a huge multi-processor computer with enough RAM to hold your entire database at once. Unfortunately, if you are paying by the CPU, your RDBMS vendor will probably give your bank account a reaming that it will not soon forget. And if you are processing a lot of INSERTs and UPDATEs, all those CPUs bristling with RAM won't help you. The bottleneck will be disk spindle contention. The solution to this is to chant "Oh what a friend I have in Seagate." Disks are slow. Very slow. Literally almost one million times slower than the computer. It would be best to avoid ever going to disk as we did in the case of SELECTs by buying up enough RAM to hold the entire data set. However, the Durability requirement in the ACID test for transactions means that some record of a transaction will have to be written to a medium that won't be erased in the event of a power failure. If a disk can only do 100 seeks a second and you only have one disk, your RDBMS is going to be hard pressed to do more than about 100 updates a second.

Oracle manages to process more transactions per second than a disk's writes/second capacity. What the DBMS does is batch up transactions that come in at roughly the same time from different users. It writes enough to disk to make them all durable and then returns to those users all at once.

The first thing you should do is mirror all of your disks. If you don't have the entire database in RAM, this speeds up SELECTs because the disk controller can read from whichever disk is closer to the desired track. The opposite effect can be achieved if you use "RAID level 5" where data is striped across multiple disks. Then the RDBMS has to wait for five disks to seek before it can cough up a few rows. Straight mirroring, or "RAID level 1", is what you want.

The next decision that you must make is "How many disks?" The Oracle8i DBA Handbook (Loney and Theriault; 1999) recommends a 7x2 disk configuration as a minimum compromise for a machine doing nothing but database service. Their solutions start at 9x2 disks and go up to 22x2. The idea is to keep files that might be written in parallel on separate disks so that one can do 2200 seeks/second instead of 100.

Here's the Oracle8 DBA Handbook's 17-disk (mirrored X2) solution for avoiding spindle contention:

DiskContents
1 Oracle software
2 SYSTEM tablespace
3 RBS tablespace (roll-back segment in case a transaction goes badly)
4 DATA tablespace
5 INDEXES tablespace (changing data requires changing indices; this allows those changes to proceed in parallel)
6 TEMP tablespace
7 TOOLS tablespace
8 Online Redo log 1, Control file 1 (these would be separated on a 22-disk machine)
9 Online Redo log 2, Control file 2
10 Online Redo log 3, Control file 3
11 Application Software
12 RBS_2
13 DATA_2 (tables that tend to be grabbed in parallel with those in DATA)
14 INDEXES_2
15 TEMP_USER
16 Archived redo log destination disk
17 Export dump file destination disk
Now that you have lots of disks, you finally have to be very thoughtful about how you lay your data out across them. "Enterprise" relational database management systems force you to think about where your data files should go. On a computer with one disk, this is merely annoying and keeps you from doing development; you'd probably get similar performance with a simple RDBMS like PostgreSQL. But the flexibility is there in enterprise databases because you know which of your data areas tend to be accessed simultaneously and the computer doesn't. So if you do have a proper database server with a rack of disk drives, an intelligent manual layout can improve performance fivefold.

Don't forget to back up

Burning car.  New Jersey 1995. Be afraid. Be very afraid. Standard Unix or Windows NT file system backups will not leave you with a consistent and therefore restoreable database on tape. Suppose that your RDBMS is storing your database in two separate Unix filesystem files, foo.db and bar.db. Each of these files is 200 MB in size. You start your backup program running and it writes the file foo.db to tape. As the backup is proceeding, a transaction comes in that requires changes to foo.db and bar.db. The RDBMS makes those changes, but the ones to foo.db occur to a portion of the file that has already been written out to tape. Eventually the backup program gets around to writing bar.db to tape and it writes the new version with the change. Your system administrator arrives at 9:00 am and sends the tapes via courier to an off-site storage facility.

A fireplace in Skansen in Stockholm At noon, an ugly mob of users assembles outside your office, angered by your introduction of frames and failure to include WIDTH and HEIGHT tags on IMGs. You send one of your graphic designers out to explain how "cool" it looked when run off a local disk in a demo to the vice-president. The mob stones him to death and then burns your server farm to the ground. You manage to pry your way out of the rubble with one of those indestructible HP Unix box keyboards. You manage to get the HP disaster support people to let you use their machines for awhile and confidently load your backup tape. To your horror, the RDBMS chokes up blood following the restore. It turned out that there were linked data structures in foo.db and bar.db. Half of the data structures (the ones from foo.db) are the "old pre-transaction version" and half are the "new post-transaction version" (the ones from bar.db). One transaction occurring during your backup has resulted in a complete loss of availability for all of your data. Maybe you think that isn't the world's most robust RDBMS design but there is nothing in the SQL standard or manufacturer's documentation that says Oracle, Postgres, or SQL Server can't work this way.

Full mirroring keeps you from going off-line due to media failure. But you still need snapshots of your database in case someone gets a little excited with a DELETE FROM statement or in the situation described above.

There are two ways to back up a relational database: off-line and on-line. For an off-line backup, you shut down the databases, thus preventing transactions from occurring. Most vendors would prefer that you use their utility to make a dump file of your off-line database, but in practice it will suffice just to back up the Unix or NT filesystem files. Off-line backup is typically used by insurance companies and other big database users who only need to do transactions for eight hours a day.

Each RDBMS vendor has an advertised way of doing on-line backups. It can be as simple as "call this function and we'll grind away for a couple of hours building you a dump file that contains a consistent database but minus all the transactions that occurred after you called the function." Here is the shell command that will export a snapshot of an Oracle database into a dump file:

exp DBUSER/DBPASSWD file=/exportdest/foo.980210.dmp owner=DBUSER consistent=Y
This exports all the tables owned by DBUSER, pulling old rows from a rollback segment if a table has undergone transactions since the dump started. If you read Oracle Performance Tuning (Gurry and Corrigan 1996; O'Reilly), you'll find some dark warnings that you must export periodically in order to flush out cases where Oracle has corrupted its internal data structures. Another good reason to export is that periodically dropping all of your tables and importing them is a great way to defragment data. At ArsDigita we export every customer's Oracle database every night, except the handful of customers with terabytes of data.

What if your database is too large to be exported to a disk and can't be taken offline? Here's a technique practiced by a lot of experienced IT groups:

What if one of the on-line disks fails during backup? Are transactions lost? No. The redo log is on a separate disk from the rest of the database. This increases performance in day-to-day operation and ensures that it is possible to recover transactions that occur when the mirror is broken, albeit with some off-line time. Some organizations have three mirrors. They can pull pull off one set of physical disks and back them up without running the risk that a drive failure during the backup window will take the database management system offline.

The lessons here are several. First, whatever your backup procedure, make sure you test it with periodic restores. Second, remember that the backup and maintenance of an RDBMS is done by a full-time staffer at most companies, called "the dba", short for "database administrator". If the software worked as advertised, you could expect a few days of pain during the install and then periodic recurring pain to keep current with improved features. However, dba's earn their moderately lavish salaries. No amount of marketing hype suffices to make a C program work as advertised. That goes for an RDBMS just as much as for a word processor. Coming to terms with bugs can be a full-time job at a large installation. Most often this means finding workarounds since vendors are notoriously sluggish with fixes. Another full-time job is hunting down users who are doing queries that are taking 1000 times longer than necessary because they forgot to build indices or don't know SQL very well. Children's Hospital has three full-time dbas and they work hard.

If all of this sounds rather tedious just to ensure that your data are still around tomorrow, you might be cheered by the knowledge that Oracle dbas are always in high demand and start at $60,000 to $80,000 a year. When the Web bubble bursts and your friends who are "HTML programmers" are singing in the subway, you'll be kicking back at some huge financial services firm.

We'll close by quoting Perrin Harkins. A participant in the Web/db question-and-answer forum (http://www.arsdigita.com/bboard/q-and-a.tcl?topic=web/db) asked whether caching database queries in Unix files would speed up his Web server. Here's Perrin's response:

"Modern databases use buffering in RAM to speed up access to often requested data. You don't have to do anything special to make this happen, except tune your database well (which could take the rest of your life)."

Reference


philg@mit.edu

Reader's Comments

Thought I'd point this out:- Oracle 9i has now apparently become one of the better XML databases available. While I have no practical experience as to whether PostgreSQL's implementation is better or even comparable, it's nevertheless interesting to note that even Yukon probably wont carry this feature.

-- Akshay R, May 25, 2004
Since MS Sql Server 7, the varchar data type can hold 8000 characters. (Don't know about earlier versions). MS Sql Server 2005 Yukon is supposed to allow .NET code in stored procedures. In my opinion, the chief limitation of MS Sql Server is that it lacks "Oracle Forms and Reports." Yukon is supposed to remedy this.

-- Louis N, July 29, 2004
PostgreSQL does support arbitrarily long text fields, I have been using it at least since the 7.3 (probably 7.2) days:
SELECT length(nota) as size from articulo order by size desc;
size
-------
70720
56067
38961
34634
33388
(...)


-- Gunnar Wolf, December 17, 2004
Add a comment