Beyond the RDBMS: the Brave New (Old) World of NoSQLby Andrew Grumet and Philip Greenspun in January 2011, as part of Three Day RDBMS |
Consider the following:
With its native support for concurrent updates, the RDBMS enabled programmers of ordinary skill to build the early collaborative Internet applications in a reliable enough fashion to be useful. The RDBMS may have become a victim of its own success, as those applications were so useful to the millions of early Web users that they encouraged billions to sign up for Internet access (users by country). Twitter, a primarily text-based service, was collecting 7 TB of new data every day in early 2010 (slide show). An accepted definition of a "very large database" in 2000 was "more than 1 TB" and a database size between 5 and 50 terabytes (one week of Twitter data!) was something to write about in an academic journal (example).
When the relational model is a natural fit to your data, the simplest and usually least expensive way to run any Internet application is with a single physical computer running an RDBMS from a set of local hard drives. The system administration and hosting costs of running one computer are lower than those of running more than one computer. The cost in time, hardware, and dollars of synchronizing data is never cheaper than when all of the data are in the main memory of one machine. The costs of software development and maintenance are also low, since SQL is so widely understood and SQL programs tend to be reliable.
Below are some criteria for evaluating when it is time to considering abandoning the simple one-server RDBMS as the backend for an application.
It is difficult to find realistic benchmarks for the kind of database activity imposed by an Internet application. The TPC-E benchmark is probably the closest among industry standards. TPC-E is a mixture of fairly complex reads (SELECTs) and writes (INSERTs and UPDATEs) into an indexed database that gets larger with the number of SQL statements that are attempted for processing. The "transactions per second" figures put out by the TPC include both reads and writes. In 2010, a moderately priced Dell 4-CPU (32 core) server hooked up to, literally, more than 1000 hard disk drives, processed about 2,000 transactions per second into an 8-terabyte database.
The authors crowd-sourced the question in this blog posting and for smaller databases that can be stored mostly on solid-state drives, it seems as though a modest $10,000 or $20,000 computer should be capable of 10,000 or more SQL queries and updates per second.
If you think that there is a realistic chance of exceeding one thousand SQL operations every second, you should put some effort into benchmarking your hardware to see if it falls over. Note that at three SQL statements per page served, 10 pages per user session, and 86,400 seconds in a day, 1000 SQL operations per second translates to a capacity of 288,000 users per day (assuming that the load is smooth, which is probably unreasonable, so divide by two to get 144,000 users per day).
Note that some Web developers manage to load the database with more than 1,000 SQL requests every second even when there are only a handful of users. In tools such as Ruby on Rails, for example, it is possible for a programmer to generate code that, unbeknownst to him or her, will fetch 50,000 rows from the database using 50,000 SQL queries rather than a single query that returns 50,000 rows (and then of course the programmer will use Ruby to filter that down to the 30 rows that are displayed to the user!). See this tale of a sluggish Ruby on Rails server for more.
Users who live far, in network terms, from the data center will have a slower experience of the service than users who live close to the data center. There are simply more network hops for the data to travel, and if those data are not cacheable, users will have to make the full round trip every time. In some cases the round trips can be sped up using route optimization such as Akamai's dynamic site accelerator. Unlike a traditional content delivery network (CDN) setup, the edge servers on these systems do not cache, they simply proxy uncached user data. Between the data center and user, the data takes a CDN-managed "express lane" to deliver higher speeds to the user. This arrangement allows you to improve global performance without having to distribute the data.
An alternative is to move the data closer to the users. If the data can be partitioned by the user id, you can set up a second data center in Europe and place data for EU community users there, a third data center in China and so on. Amazon.com did this when they set up Amazon UK and Amazon Germany. They copied all of the software onto a new server and set up shop (literally) in those foreign countries. It may be cumbersome to do a query comparing sales of a product in Germany to sales of the same product in the U.S., but management of each database is easier and the consequences of a failure don't shut down the business everywhere in the world.
A similar approach can be taken whenever there is limited value in comparing data from different users. Consider supporting smartphone users with an RDBMS storing contacts, emails, and calendar. Is there any value in comparing the content of Joe's email with Susan's phone numbers? If not, why lump them all together in one huge database? When that one huge database server fails, for example, every customer of the phone company will be calling in at once asking "What happened to my contacts?" [This is not a hypothetical, see "When the Cloud Fails: T-Mobile, Microsoft Lose Sidekick Customer Data".] Instead of one enormous cluster of exotic machines and hard drives, why not buy 100 1U ("pizza box") machines and assign customers to them according to the last two digits of their phone numbers?
Given a standard hard drive layout, committed transactions can always be recovered in the event of hardware or software failure. That doesn't mean, however, that data will always be available. The simplest approach to redundancy is a "hot standby" server that has access the transaction logs of the production machine. If the production server dies for any reason, the hot standby machine can roll forward from the transaction logs and, as soon as it is up to date with the last committed transaction, take the place of the dead server. As a bonus, the hot standby machine can be used for complex queries that don't need to include up-to-the-second changes. For Oracle 11g, look for "Oracle Data Guard" to learn more about this approach.
To facilitate disaster recovery, e.g., after a fire that destroys a server room, changes to the database must be sent to a server in another building or in another part of the world. If transaction logs are sent every night, the worst possible consequence of the disaster will be the loss of a day's transactions. If that isn't acceptable, it will be necessary to use various replication and distribution strategies to ensure that transactions are transmitted to the disaster recovery server as part of the commit process (look up "two phase commit" in the Oracle documentation, for example).
Here are some examples of products that go beyond the "one computer" architecture but are not part of the "NoSQL", "NoACID" fad:
At its simplest, a key-value database is a persistent associative array. The most familiar example is BerkeleyDB, a key/value store derived from the 1979 dbm. Since a key-value database can be straightforwardly implemented in any RDBMS as a single table with a VARCHAR and BLOB, modern key/value databases tend to arise in the context of solving one or more of the RDBMS' deficiencies. Examples:
If you'd like to say that you're running both an RDBMS and a NoSQL DBMS, this posting on HandlerSocket explains how to bypass the SQL parser and turn MySQL into a NoSQL database (thanks to Michael Edwards for pointing this out).
A familiar problem for users of object-oriented languages is how to map the runtime class hierarchy to a relational database when object persistence is needed. To meet the challenge, a variety of object-relational mapping systems such as Java's Hibernate and Ruby On Rails' ActiveRecord have evolved. Object databases, on the other hand, represent objects directly without any translation overhead. Given the popularity of object-oriented languages, such as Java, it is a mystery as to why object databases aren't more popular. Indeed, folks in the 1980s were already talking about the imminent death of the RDBMS, to be supplanted by the mighty new ODBMS. One theory: database users turned out to care more about attributes than identity. Object DBMSes are very fast if you already know what you're looking for, e.g., the thing that this other thing points to. On the other hand, relational databases are better suited when you need to query for objects matching a certain criteria, e.g., the things that are blue, happened on a Tuesday, and were not given away free.
MapReduce is a framework for splitting up a big computing task into a number of smaller tasks that can be run in parallel. Let's illustrate with an example. Suppose that you have a very large HTTP server log file to parse on a mostly-idle four-core machine. Perhaps you need to count up the number of bytes returned for all responses delivered with either a 200 or 206 status code. You write a software routine that identifies lines matching the criteria for the request portion and status code, extract the bytes transferred for those lines and add them to a running sum. Then you kick off the job and wait. As the parsing proceeds, you notice that only one of your four cores is busy. How could you use the full power of the machine to speed up the job?
One option is to split the log file into four roughly equal-sized parts, making sure to split along line boundaries in order to avoid parse errors. Now you can run four copies of the parser in parallel, one on each of the smaller files. Each routine runs on an available core, taking a filename as input and returning a byte count as output. Assuming that the processes are CPU-bound, this should run about four times faster than the original program. A final piece of computation is still required: you must add up the outputs of the 4 jobs to get the final byte count.
This is the essence, then, of a system that implements MapReduce:
It turns out that a lot of the basic work of splitting, mapping, dispatching and reducing can be formalized for reuse in building MapReduce systems. Apache Hadoop is one such framework. As such, MapReduce is not itself a database management system. Instead, database management system may employ MapReduce to run queries against large data populations using multiple cores and/or machines.
More: Wikipedia