A friend recently took over development of a business-to-business site (i.e., one where the number of users will be limited) and faces the challenge of re-architecting the software to run from a standard relational database management system rather than Mongo, a “NoSQL” database. It turns out that the business requirements are most easily met with the flexibility of programming in SQL. He shared with me this anecdote: “a friend of mine, who works at [big company with infinite money], told me about a colleague who decided to use Mongo for something there, only later to change his mind and use Postgres. When asked, the friend said, “well, it sounded really cool at first, until I realized that NoSQL is just another name for what people did before there were relational databases”.
12 thoughts on “Simplest way to explain NoSQL database management systems”
Comments are closed.
NoSQL is very valuable for certain use cases (perhaps not very common) in terms of scalability and dataset sizes. The problem was picking a technology because it sounded “cool”.
What did people use before RDMS? Filesystems? Did it support sharding, elastic scaling, or map/reduce operations?
Wall Street companies with infinite money retain their expensive programmers by letting them rewrite their systems every few years in the buzzword-du-jour-technology to keep themselves entertained.
TK: Are you sure that the file system was the most commonly used DBMS prior to the RDBMS? Why wouldn’t the network or hierarchical DBMS have been the standard choice?
MongoDB and Cassandra, etc fill a very real need for a fairly small but complex and interesting slice of the tech world. The problem is that every geek wants to believe that his technology problems are complex and interesting, and usually they just aren’t.
Also overlooked is that for many instances of “complex and interesting”, even in the relevant domains, Postgres or MySQL can be used in a relatively schema-free manner, and at high performance and higher supportability than the NoSQL alternatives.
But sometimes, truly, these new databases are your only hope. And I stress *hope*, because you’re usually betting the (startup) business on the product.
SQL + SSDs > NoSQL for 99.999% of applications. Cheap consumer SSDs are approaching 1000x the IOPS of spinning disks, and any revenue producing business can afford their $1/GB price.
Dunno about Mongo and other such, but ObjectStore was a good fit for a couple places I’ve seen it used. Object-Relational impedance mismatch is a real thing, as long as we’re talking about SQL. Maybe Tutorial-D is the answer but I never read up on it. Representing graphs in SQL also remains problematic.
If you’re doing something like a complicated simulation with dozens of kinds of things in hierarchies the queries to deserialize the objects get absurd fast. Not everybody is working with “business data”.
My comment sounds sarcastic but it was an actual question as I was curious what came before RDBMS. Reading about hierarchical and network DBMS, it sounds like they use different data modeling but are still essentially the same in terms of scalability in that it’s designed to run on a single machine.
RDBMS is unsurpassed in terms of maturity and data modeling. I will always choose RDBMS almost every time. But you can beef up a single machine with only so much memory and CPU. As data/load increases, you have to deal with all sorts of issues such as the dreaded snapshot too old error in Oracle. In MySQL, adding columns to a large table can take hours because it has to rebuild the table, with is not feasible in terms of downtime.
When you get to a point where you have to start joining extremely large tables, you often have to denormalize data anyway with indexed or materialized views. As for sharding, you either manually shard tables or rows, which is problematic (distributed transactions, ugh) or spend gobs of money on a SAN.
NoSQL is really about scalability, elasticity, flexibility, and availability. Which is why it’s being pioneered by the likes of Google, Amazon, Facebook, Netflix, etc. to solve real-world problems.
BTW, despite the overuse of the word “cloud”, it really is cool that you can spin up hundreds of machines, do some work, write out a check, and move on.
Kind of old but here’s some interesting reading.
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
TK,
Why to add columns to RDBMS tables dynamically? Yo can store object in table rows and append rows. Row append can be a non-locking operation and read can rely on row level locking (ie. row is sized to feet a single page). Should scale up for hundreds or thousands of concurrent users even when single server based.
Nix MySQL and use real RDBMS – many of them support multiple servers and cloud.
Sure it would be fun to implement NoSQL hash based solution but why if you are not Google?
I can’t tell you the number of people I’ve run across who want to make SQL “completely dynamic” so that users can define data any way they want. They generally gravitate towards an EAV design, and are then delighted with the fact that they will never again have to do any DDL. Or any conceptual modeling.
Until they try to make meaningful use of the resulting data. You might as well do archaeology on a tel.
There are uses for NoSQL type tools. But those who use it to pull an end run around requirments analysis are just taking a cheap shortcut.
TK,
Codd’s 1970 paper details the comparison between existing DBMS models and the then proposed relational system. Codd introduced “data models” in order to make this comparison in a systematic way. The biggest database of the day might have been IMS. It was years before Oracle surpassed it.
Existing DBMS products had advanced the state of the art regarding transactions and concurrecy, organized database backups, and self defining data quite a ways beyond mere files and records, before relational took hold.
I never learned all of this when I was coming up to speed on relational databases. It was only later that I wanted to take a look backward. Those who do not learn from history are condemned to repeat it.
Walter Mitty,
I only suggest row per object as one of the solutions. It would require initial design that fits the problem it is supposed to resolve. Should be combined with extensive logging and XSD specifications that can be stored in the database too. I would use it if most of the project benefits form relational schema but some functionality needs to serve and modify a range of different objects that could satisfy several common criteria (implement common interfaces).
Does not make sense to introduce another data layer (NoSql) if relational db is already required because of features you mentioned.