A tale of two servers…
Server #1: I attended a presentation by a guy with a background in law and business, what we programmers commonly call “an idiot”. He had set out to build a site where people could type in the books, dvds, and video games that they owned and were willing to swap with others. Because he didn’t know anything about programming, he stupidly listened to Microsoft and did whatever they told him to do. He ended up with a SQL Server machine equipped with 64 GB of RAM and a handful of front-end machines running ASP.NET pages that he and few other mediocrities cobbled together as best they could. The site, swaptree.com, handles its current membership of more than 400,000 users (growing rapidly) without difficulty. A typical query looks like “In exchange for this copy of Harry Potter, what are the 400,000 other users willing to trade? And then for each of those things, what else might some third user (chosen from the 400,000) give in a three-way trade.” This kind of query is done many times a second and may return more than 20,000 rows.
Server #2: A non-technical friend hired an MIT-educated software engineer with 20 years of experience to build an innovative shopping site, presenting Amazon-style pages with thumbnails and product descriptions. Let’s call my friend’s site mitgenius.com. The programmer, being way smarter than the swaptree idiot, decided to use Ruby on Rails, the latest and greatest Web development tool. As only a fool would use obsolete systems such as SQL Server or Oracle, our brilliant programmer chose MySQL. What about hosting? A moron might have said “this is a simple site just crawling its way out of prototype stage; I’ll buy a server from Dell and park it in my basement with a Verizon FiOS line going out.” An MIT genius, though, would immediately recognize the lack of scalability and reliability inherent in this approach.
How do you get scale and reliability? Start by virtualizing everything. The database server should be a virtual “slice” of a physical machine, without direct access to memory or disk, the two resources that dumb old database administrators thought that a database management system needed. Ruby and Rails should run in some virtual “slices” too, restricted maybe to 500 MB or 800 MB of RAM. More users? Add some more slices! The cost for all of this hosting wizardry at an expert Ruby on Rails shop? $1100 per month.
For the last six months, my friend and his programmer have been trying to figure out why their site is so slow. It could take literally 5 minutes to load a user page. Updates to the database were proceeding at one every several seconds. Was the site heavily loaded? About one user every 10 minutes.
I began emailing the sysadmins of the slices. How big was the MySQL database? How big were the thumbnail images? It turned out that the database was about 2.5 GB and the thumbnails and other stuff on disk worked out to 10 GB. The servers were thrashing constantly and every database request went to disk. I asked “How could this ever have worked?” The database “slice” had only 5 GB of RAM. It was shared with a bunch of other sites, all of which were more popular than mitgenius.com. Presumably the database cache would be populated with pages from those other sites’ tables because they were more frequently accessed.
How could a “slice” with 800 MB of RAM run out of memory and start swapping when all it was trying to do was run an HTTP server and a scripting language interpreter? Only a dinosaur would use SQL as a query language. Much better to pull entire tables into Ruby, the most beautiful computer language ever designed, and filter down to the desired rows using Ruby and its “ActiveRecord” facility.
Not helping matters was the fact that the sysadmins found some public pages that went into MySQL 1500 times with 1500 separate queries (instead of one query returning 1500 rows).
In reviewing email traffic, I noticed much discussion of “mongrels” being restarted. I never did figure out what those were for.
As the MIT-trained software engineer had never produced any design documentation, I could not criticize his system design. However, I suggested naively that a site with 12.5 GB of data required to produce customer pages would need a server with at least 12.5 GB of RAM ($500 retail for the DIMMs?). In the event that different customers wanted to look at different categories of products, it would not be sufficient to have clever indices or optimized queries. Every time the server needed to go to disk it was going to be 100,000 times slower than pulling data from RAM.
My Caveman Oracle/Lisp programmer solution: 2U Dell server with 32 GB of RAM and two disk drives mirrored. No virtualization. MySQL and Ruby on Rails running as simultaneous processes in the same Linux. Configure the system with no swap file so that it will use all of its spare RAM as file system cache (we tore our hair out at photo.net trying to figure out why a Linux machine with 12 GB of RAM, bought specifically to serve JPEGs, would only use about 1/3rd for file system cache; it stumped all of the smartest sysadmins and the answer turned out to be removing the swap file). Park at local ISP and require that the programmer at least document enough of the system that the ISP’s sysadmin can install it. If usage grows to massive levels, add some front-end machines and migrate the Ruby on Rails processes to those.
What am I missing? To my inexperienced untrained-in-the-ways-of-Ruby mind, it would seem that enough RAM to hold the required data is more important than a “mongrel”. Can it be that simple to rescue my friend’s site?
[August 2009 update: The site has been running for a couple of months on its own cheapo Dell pizza box server with 16 GB of RAM. The performance problems that the Ruby on Rails experts had been chasing for months disappeared and the site is now responsive.]
[March 2010 update: My friend is now in discussions with some large companies interested in using his technology or service. Andrew Grumet and I published “Software Design Review” as a result of this experience.]
I think you hit the nail right on the head about memory being the problem. $1100 a month for that kind of hosting sounds awfully steep – a colocated server is almost certainly a more appropriate tool for the job.
FYI: Mongrel is a Ruby web server, and running more mongrels is just having more front-end server processes running. It’s a simple thing to change when you’re scaling up, but more mongrels won’t do any good if your database is the bottleneck or if you don’t have enough memory for them.
The best way to field a scalable Ruby on Rails app is to find a host that can preconfigure and host the stack, out of the box, so to speak. Heroku is one such host. It need not be so tortured.
You almost got it.
I have found it better to use a company that will replace your hardware when there is a failure (softlayer). Benchmarks have show MySQL to be 2x slower in a virtualized environment. So far I have found virtualization most useful for development
Get NewRelic to see what parts of your app are popular and slow. Optimize those parts, cache in memcached (better place to use all that ram) so you don’t hit the DB. If you still have problems with db, put it on an Intel SSD (do whatever you can to keep from sharding).
What you’re missing is that your solution doesn’t involve cloud computing. Don’t you know that cloud computing is the future? TechCrunch says so like every day.
Cloud computing!
Ignoring your architecture question, and zooming in on one detail, Mongrels used to be the best way to run the ruby processes for a rails app. Now it’s Phusion Passenger, http://www.modrails.com/, which is insanely easier to setup and deal with.
Ram is King. If you need 16GiB or more, you are almost always better off buying a server (and co-locating it. hosting it in the basement to save $100/month is rarely worth it. I spend that much on DSL and I only use my garage as a test lab.) even at 8GiB, you usually win if it is going to be hosted for a few months (but the price differential between a 1u with 8GiB and 1 core2quad and a dual quad-core opteron with 32GiB ram is such that I’d go for the 32GiB box anyhow – that is the win with virtualization. if you only need 8Gib you can slice up your box and host friends or whatever with the spare space.)
VPSs make a lot of sense if you need a small amount of ram. I can sell you a VPS with 2GiB ram for cheaper than any 1u co-location I know of. It’s great for a dev box, or if you really just need something small. “cloud” providers like ec2 and mosso are brilliant for backup servers. ; you setup your backup servers, test them maybe once a month, and you only have to pay for them if something horrible happens to your primary server. But usually they are a rather expensive way to run production.
In a recent talk, Joel of JoelOnSoftware.com described the construction of StackOverflow.com. Their experience found the Microsoft-based technology stack required a fraction of the hardware resources of the “LAMP” based servers. The extra cost of the MSFT licenses was more than offset by hardware savings.
Not being particularly fond of Microsoft, I’d love to hear of a counter-example LAMP success story.
Everyone: I had to delete a buttload of comments from Ruby on Rails enthusiasts. To start with, none of them picked up on the irony in the original post. Nearly all imagine that I thought it was a good idea for a Web page to submit 1500 database queries. About half of the Ruby on Rails programmers thought that I was either the publisher or programmer of the poorly performing site (in fact I’m just a guy who sent a few emails to help a friend). About 10 percent of the Ruby on Rails programmers thought that I was somehow involved with swaptree.com (I was a judge at a technology contest recently and heard a presentation by one of their founders).
The technical solutions of the Ruby on Rails commenters were interesting. Nearly all thought it was an outrageous idea to have as much RAM as site data, even when that RAM could be purchased for less than $1000. “Going to disk isn’t that slow.” Nearly all thought that a database management system was inherently horrifically slow. Their idea for improving performance was to precompute some queries and stuff crud either into the file system or into memcached, preferably with about four copies of the crud. None of them thought about the fact that this would increase the RAM requirements by a factor of 4 (since they now had one copy of the data in the database and 4 extra copies in memcached). None thought that it might be possible, by giving the RDBMS some RAM and being clever with SQL, to build adequately fast pages that directly queried MySQL (recall that mitgenius.com currently has at most one user at a time!).
A lot of the Ruby on Rails commenters got hung up on the idea of configuring a Linux box with no swap. I’ve edited the original post to explain a bit about this. We bought a server at photo.net specifically to serve user-uploaded images, mostly thumbnails. With 12 GB in the box and nothing for it to do but run Linux and lighttpd, we figured it would hold about 11.5 GB of images in file system buffer cache. In fact, it would not use more than a few GB for file system cache, leaving most of the memory unused, until we removed the vestigial swap file (which it never would have needed and, more than two years later, never has needed).
All of this reminds me of 1997. The dullards would use ASP and SQL Server. They had simple working Web sites that were reasonably efficient (since ASP pooled RDBMS connections and ran threaded inside a very efficient IIS). Their code was easy to debug for any corporate client/server slave because even if he or she didn’t understand the Web or VB, the SQL queries were right there in the code. Netflix is a good example of the dullard approach. The geniuses, by contrast, were building three-tiered systems with “application servers” in the middle. The genius-built servers consumed 50X more hardware resources, were much more difficult to debug or modify, and had far greater downtime. The state-of-the-art tools became orphans and albatrosses around the publishers’ necks. For 99 percent of the world’s programmers, it turned out that a simple architecture worked a lot better than a complex one that in some theoretical world might have offered higher performance.
It’s mostly that simple.
Look up Jim Grey’s “5 Minute Rule” paper, and the 2 recent updates of it. Access frequency determines where a data object should be in the storage hierarchy. At current prices and latencies, data accesses more frequently than every 6 hours should live in ram rather that rotational storage (flash disk, if available, is ideal in the range of 15 minutes to roughly once a day).
“Not helping matters was the fact that the sysadmins found some public pages that went into MySQL 1500 times with 1500 separate queries (instead of one query returning 1500 rows).”
ActiveRecord tends to be lazy. Most likely the application is doing something like this:
@articles = Article.find(:all)
@articles.each do |article|
article.comments
end
In that case, you hit the database each time you do article.comments since it needs to load the comments for the article. You can use eager loading by using the :include keyword option. Article.find(:all, :incude => :comments).
Beyond a few application errors like that, I’m guessing that the majority of the problems are in the database. Many Rails programmers forget that the database exists and that things like indexes are important if you’re running queries that aren’t using the primary key. MySQL has some decent enough tools to go through the queries and see how you could improve them and the Rails logs include the execution time of the queries it is running. The Rails logs are a wonderful place to start. You could search through them to find queries taking longer than X time and have MySQL analyze them.
On the mongrel side of things: mongrel is an application server like AOLServer, but for Ruby applications. Unlike AOLServer, mongrel doesn’t do multiple threads. Rather, like Apache prefork, it uses multiple processes to handle concurrency – only you have to manually launch the multiple processes and proxy requests to them. The strategy employed by most Rails applications had been to launch as many mongrel web servers as one expected concurrent users and have a load balancer in front of them. Today, Phusion Passenger (http://modrails.com) is a more common approach. Passenger is an Apache module that works like mod_php or other modules that embed a programming language in the Apache web server. It takes the hassle out of launching multiple app servers and proxying between them. However, there is still a variable, PassengerMaxPoolSize, that you should set which will determine the maximum number of application servers Passenger will spawn. This isn’t so different from the way that mod_php works – the more Apache processes that get spawned, the better concurrency you can get until you start swapping and your box dies.
Hopefully that clarifies some things. While I can’t specifically state what is wrong, I’d say that those are the three things:
1. Code that is looping queries that can be corrected with the :include option.
2. The “wrong” application server setup. I’d switch to Passenger for ease, but if you want to stick with mongrel, look at your traffic to determine the number of app servers you need and stick the load balancer in front of several mongrels. But Passenger will take a lot of configuration and hassle out of the picture and make the app run as smoothly as mod_php or the like.
3. A database which might not have the needed indexes and optimization. Look through the Rails logs (stored in the rails_directory/logs/*.log). All of the SQL executed is right there.
Rails isn’t bad, but it can lead some programmers to assume that more magic is happening than actually is happening. It’s really easy to loop queries and, to be fair, just as easy to correct that problem. Likewise, Rails can make it easy to forget that one still needs to create indexes and know a bit more about one’s database.
Oh, and thanks for ACS and SQL for Web Nerds! Hope I was able to help.
philg: please don’t judge a diverse community based on one personal anecdote whatever landed in your inbox in response to your blog post.
About specific technical aspects:
Using s3 removes the static file portion entirely, and is fairly cost effective at both low and high volumes. Per byte billing can be a win against 95% depending on your load pattern. If performance matters a CDN is usually worthwhile once you meet a minimum volume. More importantly it scales with no developer or administrator effort, which is usually worth whatever margin it has above doing it yourself.
Memcached vs Mysql involves a lot of specifics. Memcached has a latency advantage that can be significant. It also is straightforward to use an arbitrary number of hosts with Memcached. Doing this with an database generally requires large changes to the application (sharding) or purchasing an exotic database from a vendor (Teradata), both of which are much more costly than pizza boxes running Memcached. Lastly, if your object size is smaller than the page size of the system, Memcached may give you higher hit rates per unit of ram by avoiding fragmentation.
Running a server without swap can be fine, but you need to think through the scenarios of what happens if the out of memory killer picks the wrong process such as a lock manager for some shared resource.
I think it can be even simpler. The “MIT-trained” software engineer clearly wasn’t paying attention in 6.001… Just google “Ruby garbage collection”. There’s ample evidence that Ruby hasn’t really figured out how to do garbage collection yet (and things seem to be getting better slowly, at best). My first step would be to either use JRuby (if possible, since Sun has kindly written a good garbage-collecting VM) or start over.
Phil,
I’m one of the so-called “ruby on rails enthusiasts” whose comments you deleted, and the amendments you’ve made to the post certainly clarified the situation with the requirements for the various systems.
So I’ll eat my serving of humble pie and concede that your “inexperienced untrained-in-the-ways-of-Ruby mind” is bang on the money. Get a single box with no virtualization and a mountain of RAM. If the DB requirements grow move the web server/rails onto a separate box. If the web/app layer ever becomes the bottleneck you can scale it out easily by starting up extra instances and load balancing with a reverse proxy, including starting them up on demand on Amazon EC2 if you want to win bonus buzz word points by mentioning “the cloud”.
Best of luck to your friend in sorting it out,
Glenn
Hi Philip,
As a former ACS/OpenACS developer (6.5 years) and now Ruby on Rails developer (3.5 years), I’m going to have to take issue with several points in your e-mail. In no particular order:
Virtualization isn’t just a Ruby on Rails thing: it’s a system administration thing, and everyone’s doing it. Sure, you lose some direct access to the hardware, but you gain significant ease of administration: you can back up whole images at a point-in-time, you can trivially move the image to a new [physical] server, etc. A good friend of mine is a sysadmin for a large southern Ontario ISP and he will happily virtualize to a point where there is a single VM running on a single box, just to get these administration benefits.
ActiveRecord was a breath of fresh air to me after many years of SQL inside .adp or .tcl pages. Sure, it has its warts, but it was the first ORM I looked at where I thought “this is so much better than what I’ve been doing.” The 1500 queries in one page is an artefact of the programmer not taking the time to set up a join, which is usually quite trivial in ActiveRecord. I will confess that SQL itself is more trivial and powerful than the ActiveRecord interface, but the fact that ActiveRecord is returning objects and not rows is such an improvement that it’s well worth the small bit of awkward.
Some Rails hosting shops are very expensive. They get away with it because they’re also some of the most significant names in the biz. ~shrug~. There are other hosting options that are much more reasonably priced. I do all my hosting on Amazon EC2 (yes, the “cloud”) and I love it. Virtualized environments, easy backups, easy fail recovery, powerful boxes for cheap… but I’m also not afraid of a command line. These other shops are aiming to abstract away the command line.
I’ll agree with your cynicism when you said “Nearly all thought that a database management system was inherently horrifically slow.” There is a recent trend among RoR devs where RDBMS’s are considered slow and they are moving to key/value databases which are much faster at naïve queries. I tend to side with the argument you proposed back in 1997 where RDBMS’s are slow because of ACID compliance, and that’s a good thing. That said, RDBMS’s are the one remaining bottleneck in web development because they don’t scale [well/easily] beyond a single server. This doesn’t matter for 99% of the sites out there, but the 1% that remains is running into trouble on the RDBMS side, so I applaud the efforts going into key/value data store research. That said, I’m still using Postgres for my day-to-day work because I don’t like pain.
Finally: Many OpenACS developers (particularly the Scandanavian ones, and me the lone Canadian) were some of the earliest adopters of Ruby on Rails. We looked at RoR and with a collective breath we exclaimed “this is what I’ve been missing for all those years in OpenACS!” ACS/OpenACS was amazing at the time when everyone was writing their own web server from scratch and we could get a site up and running quickly. However, the state of the web world changed and Ruby on Rails was new and fresh, and it offered an elegant simplicity far beyond the simplicity we had loved in the ACS/OpenACS world. Since I switched, I’ve never looked back.
Don’t wish too much for a return of 1999. It’s actually better out there now. Web development has come a long way, and it’s generally gone in the right directions. The RoR problems you’re seeing are implementation-dependent, and not problems with the framework. And for that matter, I’m not naïve to the problems of the framework… there are many, but the ones you’re describing are not among RoR’s shortfalls.
Paul.
I love your take on this. After all it’s just points of data being passed back and forth over HTTP with some HTML markup to make it readable. CGI was sufficient but now we have abstractions upon abstractions running on virtualizations of virtual servers. Yesterday it was N-Tier and Application Server Farms, today it is Rails in the Cloud. Oh yeah, and relational databases are passe! We need to use something different (XML?) because we’ve found that these types of highly repetitive lookup operations we need to do to serve web pages are just not what relational databases are good at.
My god, it’s just computers spitting text at each other, not the Superconducting Super Collider.
I prefer Application Express from Oracle. I get a terrific database with an HTTP server and a framework for creating, serving and managing web applications that resides entirely in database tables. Need to know what part of my application uses a certain table, it’s all in the data dictionary. Tuning the app and tuning the database are the same thing, same tools, same techniques, all drawing from data in the same place. Oh and I only need SQL and PL/SQL now which is good because I’m tired of learning new front end languages just to concatenate text. Because it’s Oracle I can get object oriented if I really want to but in a stateless environment serving hypertext markup over a hypertext transport protocol I can’t really think of any good use for it.
I am a Rails developer (among many other hats) and I differ from the Rails community significantly in terms of how important I think the database should be. To me, an application is about the data it contains, and the database should be the primary layer that validates and constrains data.
The real issue is that many rails developers:
1) Assume the database is stupid and can only provide a glorified flat file-like storage layer, with all logic, constraints, foreign keys and validation living in Rails code,
2) Are lulled by ActiveRecord’s high-level interface. It insulates you from what’s actually going on under the covers, making it very easy to write inefficient code,
3) Emphasize elegant code over efficient execution, with an attitude of “RAM and hardware is cheaper than dev time” prevailing too often in discussions of application performance, and
4) Don’t know their tools, nor how all the stack fits together. They know Rails.
As to point 1 – I blame MySQL. Its poor support of real, useful RDBMS features convinced many developers that features like foreign keys, check constraints, function indexes, triggers, etc. were just too complicated and / or not REALLY useful. Ironically, many of the features of a real database got built into ActiveRecord – taking very fast and reliable RDBMS-level features and emulating them in Rails-level code. This makes very little sense to me, especially if you think long term or need to allow other non-rails apps access to your database. If you get your schema right first, your app can outlive any development framework du jour.
If only PostgreSQL had been the rails DB of choice. . . Fortunately, you don’t need to drink the flavor-aid and Postgres works just fine with ActiveRecord, along with foreign keys, triggers, check constraints, you name it.
I’ve gotta say that the “1500 queries” thing is ridiculously common – it drives me nuts.
The issue at hand is ActiveRecord’s “lazy loading” – say you select a set of records and iterate through them. It’ll only load full object attributes from the database when you actually access an attribute. So you’ll run one query to get the set of records, and then 1500 queries to get each attribute. This is very easy to optimize away and a very easy trap to fall into at the same time – it comes down to knowing your tools, and is a reflection of points 2, 3, and 4.
Rails is a helluva a web development framework, but it’s no substitute for knowing your tools, deploying correctly and using RDBMSs properly. It’s also flexible enough to allow for many different ways of developing applications, right or wrong. There’s no excuse for not understanding your tools, except the bad kind of laziness.
So in my experience, having worked with a number of rails zealots, and having been involved in a number of rails projects I have a few observations: (These by no means apply to every rails developer, but the attitudes are epidemic in the community)
1) Rails developers see sql as hard. This is why they wrap all their queries in the ORM mess they call ActiveRecord. In at least one case, one of the devs on a large rails project I was involved with went so far as to create an SQL DSL (The rails community loves Domain Specific Languages) because sql was too hard and not pretty enough. Never mind that SQL *is* a domain specific language, but I digress…..
2) Rails developers have developed a mythology about RDBMS being slow because they see sql as hard. Why learn some ‘hard’ technology if it’s so much slower?
3) Rails developers have their opinion of the speed of RDBMS confirmed regularly because ActiveRecord is so deeply sloppy with it’s sql, and is so prone to creating n+1 queries for displaying data sets (or in one case I saw n*n*n+1, that was 7500 queries to load one page) that it makes all the round trips with the database look painfully slow.
4) The result is that caching in memory via constants or via memcached is seen as a superior solution.
5) The rails community has accepted these conclusions as a matter of faith. The rails meme causes those infected with it to become outright hostile to anyone who tries to present data as to rails flaws.
My 2c
Did you try the vm.swappiness sysctl on Linux? In my experience, you can set this to 0, then you still have swap on the off chance that you need it.
Aside from that, anyone who’s actually good at RoR will know to use :select and :include to avoid getting blob/text columns unless necessary and prefetch associations, respectively.
Finally, virtualizing the database is a terrible idea. Using VMs kills IO, which is what the database is for in the first place.
So, the moral here is that when you jump into every new technology at once without understanding any of them, then you’re.. wait for it… an idiot?
Dan: After four or five professional Linux sysadmins gave up (all of them were in fact much more competent than us), it was Jin S. Choi who Googled around and found someone else who’d succeeded by removing swap. I’m not sure if anyone ever suggested touching vm.swappiness sysctl. But really I don’t see the problem with having a machine that only has 12 GB of virtual memory (which happens all to be physical memory). In ancient times you might have had 1 GB of real memory and a 5 GB swap file at the most. The potential virtual memory space was never infinite. If Linux and its processes leak memory so badly that they run out of 12 GB, the system will eventually reboot, no? If Linux and its processes are not leaking memory, but simply need a few GB to do a computation, the OS will start taking back memory from the file system cache, no?
I’m not sure if you followed the hyperlink from the original post regarding virtualization. The young guys that I work with decided to virtualize our development server, including the Oracle “slice”. The machine was a rat-simple Dell with two disks, mirrored via software RAID. Despite having zero users, the machine kept hanging. It turned out that you couldn’t use VMware and software RAID at the same time, so one of the guys spent a few days and $500 installing a RAID card and redoing the disks. It would have saved us approximately one calendar month and perhaps one full programmer-week of time if we had simply had everyone run in the same Linux.
Angry Rails Enthusiasts Whose Comments I Deleted: A lot of the comments were of the form “Your assertion that it is impossible to build a responsive Web site with Ruby on Rails is wrong. Rails is in fact great if programmed by a great mind like my own.”
The problem with this kind of comment is that I never asserted that Ruby on Rails could not be used effectively by some programmers. The point of the story was to show that the MIT-trained programmer with 20 years experience and an enthusiasm for the latest and greatest ended up building something that underperformed something put together by people without official CS training who apparently invested zero time in exploring optimal tools. Could some team of Rails experts have done a better job with mitgenius.com? Obviously they could have! But in the 2+ years that our MIT graduate worked on this site, he apparently did not converge on an acceptable solution.
My enthusiasm for this story has nothing to do with bashing Ruby or Rails. I like this story because (1) it shows the fallacy of credentialism; a undergrad degree in CS is proof of nothing except that someone sat in a chair for four years (see http://philip.greenspun.com/blog/2007/08/23/improving-undergraduate-computer-science-education/ for my thoughts on how we could change the situation), (2) it shows what happens when a programmer thinks that he is so smart he doesn’t need to draft design documents and have them reviewed by others before proceeding (presumably another set of eyes would have noticed the mismatch between data set size and RAM), (3) it shows that fancy new tools cannot substitute for skimping on 200-year-old engineering practices and 40-year-old database programming practices, and (4) it shows the continued unwillingness of experienced procedural language programmers to learn SQL and a modicum of RDBMS design and administration, despite the fact that the RDBMS has been at the heart of many of society’s most important IT systems for at least two decades.
Devon Jones: sure, plenty of developers (not Rails developers) see SQL as hard. ActiveRecord was originally designed to make you use SQL, so you have to see the cost of what you’re doing, and exactly what’s happening in the database. Some of the stuff they added on later sort of obscured things, although some of the new stuff they’re putting in now really helps. It’s a moving target, I guess.
It is a little too easy to make an underperforming site, for many reasons; it isn’t however, difficult to see exactly what’s being sent to the database, and to trace that back to particular parts of the code. It’s a little bit surprising to hear that the programmer sat there for two years in front of an underperforming app without ever doing anything about it.
Rails does have a lot of things going for it; innate speed isn’t one of them. Given that it doesn’t sound like it was a great programmer, I’d hate to see what they would have done with ASP, or PHP. At least Rails enforces some level of separation. And it’s reasonably difficult to create code laced with SQL injection attacks, and those other sort of newbie mistakes.
The story as written doesn’t make quit add up. Surely Philip you aren’t suggesting that a database is only usable if it fits in RAM? The concept of a working set is probably more helpful here. It is OK to go to disk up to the I/O limits of your fixed storage device. Obviously it’ll be faster the less you have to hit the fixed storage and ideal if the whole database fits in memory. But for large datasets, say above 64GB, trying to throw RAM at the problem does get expensive.
I’m sure you are correct that memory was the issue, but I suspect it more relates to the fact that swap was being needed just to keep MySQL running, and there was no space whatsoever for caching.
Nick: I know that it has sounded crazy to many of the programmers who’ve commented here, but yes I would actually go to newegg.com and spend $50 on 2.5GB of RAM to hold 2.5 GB of MySQL data. I guess you could argue that the $50 could be better spent on clever consultants who would come up with a dazzling architecture that would somehow align the mongrels into a harmonious kennel, but I personally prefer simple and stupid.
Can an RDBMS server run with less RAM than the total database size? Sure, but not this one. mitgenius.com has a table with 2 million rows of product information (given 1000 bytes of info on each product, that’s 2 GB right there). All 2 million rows are required to produce user pages. The thumbnail images are also required for user pages. With a sufficient number of users looking at a variety of products, all data required to produce user pages will be pulled from disk. If there isn’t enough RAM, the result will be thrashing. I’ve said this a few times in the original post and in follow-up comments. Many Ruby on Rails programmers have said that it would be crazy to have RAM >= data set, but they don’t say why. I’m not sure what it is about Ruby on Rails that is supposed to make thrashing impossible when RAM << data set.
What kind of RDBMS could run acceptably with less RAM than data? Let’s consider a 10 TB hospital electronic medical record system with data on patients going back to 1980, including some scanned X-rays. The lab results of a patient who was admitted in 1982 and who died in 1988 are unlikely to be requested frequently. An X-ray image from 1994 is probably something that can be left on disk until requested. So the hospital can run a server with less than 10 TB of RAM (per Jim Gray’s five minute rule, referenced above and described in http://en.wikipedia.org/wiki/Five-minute_rule ; they predicted that by 2007 it would be a five hour rule and http://www.cs.cmu.edu/~damon2007/pdf/graefe07fiveminrule.pdf shows that any data required more often than once per six hours should be kept in RAM or Flash).
That’s not the situation we have here. Because the site is so slow, there have been no customers. Because there have been no customers, there are no old transaction data archived in little-used tables. The size of the total database is approximately the size of what he needs to produce public pages (at least to within $50 of RAM at newegg.com).
Why is this a controversial idea in the Ruby/Rails community? A 2U server with 64 GB of RAM is very cheap relative to programming labor costs. That is more than enough RAM to hold all of the data for almost any current Web application (excluding Google, Yahoo, and a few others). Why would you choose to have information swapping back and forth to disk when you could have it pulled into RAM on June 1st and mostly still be there for the Christmas shopping season?
I agree with your general points: we also ensure our SQL servers have enough RAM to fit the working set.
But any site that requires all 2 million rows (rather than just the indices pointing at them) to be actively retrieved when rendering *any* page is badly designed before even considering which web stack is being used.
Anyway, its a thought-provoking article. I believe the real issue is that to design a well-functioning system you have to have understanding of exactly what is happening. Layers of abstraction such as provided by Rails are fine as far as they go, but they don’t give you the freedom to ignore what the disk/CPU/memory/bus/network is doing when you use them.
Hi Phil,
your idea isn’t controversial at all. What I think is controversial is how much emphasis you’re putting on RAM as the solution to the problem.
That was the thrust of my (deleted) comment. An application that can’t handle 1 user searching a few million rows in a timely fashion sets off alarm bells for anyone sensible.
Somebody who understands Rails well enough to use the database for what it’s good at (rather than as a dumb bucket of bits) would be able to address the egregious mistakes that I believe are lurking in that code.
Once the application uses the database effectively it’s sensible to look at ways to help the DB do the job it’s good at. But until the newbie-level mistakes are addressed the app will still perform like a dog, regardless of how much RAM you throw at it.
Or to put it another way, if the app had been written using a language/framework you were really well-versed in, and it was equally wasteful in utilizing the database, would you really be putting so much emphasis on RAM as the solution?
I assert that you’d first correct the boneheaded mistakes and then reassess, even though you *know* that having an abundance of RAM can only help.
Regards, (the not-angry) Trevor.
Nick: I never said that all 2 million rows had to be retrieved to serve a single page on this site! Only that all 2 million rows were required to generate the complete set of public pages. Consider amazon.com. Customer A shows up at 12:01:00 and wants to look at televisions, browing through 1000 products. Customer B shows up at 12:01:01 and wants to look at desk lamps, browsing through 1000 products. Customer C shows up at 12:01:02 and wants to look at laser printers, browsing through 1000 products. Customer D shows up at 12:01:03 and wants to look at kitchen knives, browsing through 1000 products. With one customer arriving per second and each customer looking at 1000 products, each in a different category, after 2000 seconds (less than one hour), the amazon.com server has had to pull 2 million rows from the disk.
In any case, there is nothing necessarily wrong with a site that would look at all 2 million rows before building a page for a customer. Suppose you are running a dating Web site and the customer says “I want a guy who is about 35 years old, works as an oil painter, earns at least $70,000 per year, weighs less than 200 lbs., is at least 6′ tall, lives within 50 miles of my house, etc.” If you blindly put all of that into SQL WHERE clauses you would end up with zero results (since even with 2 million men in the database, no guy would be an exact match). Simply because you’re using an RDBMS does not mean that you must be a slave to SQL. You could take the customer’s search query and turn it into a scoring function, score all 2 million rows against her criteria, and then return the highest scorers. Thus you’d find the guy who was perfect but lived 52 miles away instead of 50, the guy who was perfect but earned $60,000 per year instead of $70,000, the guy who was a sculptor instead of an oil painter, etc.
philg: Right, I’m with you. But your model assumes completely random access to the dataset. In real life this wouldn’t happen, even on a site like Amazon.com as certain pages (say, ranked highly by Google or linked from the frontpage) would be hit far more frequently than others.
So say the database was 8Gb large, and 80% of the accesses related to maybe 20% of the database set (1600Mb). Assuming a small amount of web traffic – as seems to be the case here – and 2Gb of RAM on the server – if disk access was sufficiently fast to cater for the remaining 20% of requests which weren’t in RAM, the site still might perform acceptably well.
Yes, I realise that some of the useful cache will be continually invalidated, and the real-world performance would be worse than the theoretical performance, but I’m sure there are plenty of sites on the Internet that get by on such a configuration.
What would blow this out of the water is if the application was pulling in millions of rows each time it didn’t even need to consider, as I thought you were suggesting!
Nick: Do I win a prize for using a server with 2 GB of RAM ($42 worth) instead of one with 8 GB of RAM? Microsoft says that I can run Windows Vista on my desktop with 512 MB of RAM. Would that make me a better human being? Softer on the planet because I have saved the RAM forest in Borneo from being clearcut?
Trevor: In truth I don’t know how well this code will run given sufficient RAM that it never has to go to disk. I’m hoping that it will run fast enough that my friend won’t have to pay his contract programmer to improve the performance. Why not try to optimize the code first and then buy a server with RAM? If you’re paying a programmer by the hour it makes a lot more sense to pay him or her to work on a server with ample RAM. I wouldn’t want my friend to pay a programmer to wait for MySQL to page in a bunch of stuff. It will surely be faster and easier to debug the application on a system that never pages or thrashes. In fact, I would say that development for a site like this requires more resources than production. The 2 million product rows might be all updated with some new column of data, for example.
I find it interesting that Google’s cloud tries to keep everything in RAM over multiple machines (“Their performance gains are also impressive, now serving pages in under 200ms. Jeff credited the vast majority of that to their switch to holding indexes completely in memory a few years back. While that now means that a thousand machines need to handle each query rather than just a couple dozen, Jeff said it is worth it to make searchers see search results nearly instantaneously.” http://glinden.blogspot.com/2009/02/jeff-dean-keynote-at-wsdm-2009.html ).
Phil,
You’re correct that the point of swap is to make the system appear to have more RAM than it really does, so if you don’t need that, you can disable it. If the system does run out of RAM, it can either panic and reboot, panic and drop into a kernel debugger, or kill memory hogging processes at your option.
As for your comments on virtualization, I see two things going on. One is that you had a bad experience with one application and it happened to be VMWare. The people involved were too stubborn to try a different approach once it became clear that sticking with VMWare and the particular hardware setup wasn’t worth the effort.
The second problem, which is relevant to the story in this post, is that the “MIT expert” viewed virtualization as a panacea without understanding how contention for IO would affect an IO-heavy application.
Neither of these prove that virtualization is without benefit, only that sometimes the “experts” aren’t (seems to be a favorite theme of yours) or, alternatively, that smart people can make bad decisions if they focus too much on hype.
Finally, re: the RAM holy war going on, I’m neutral. With correct indexing (and the index in RAM) you should get acceptable performance. If everything is in RAM, you should get much much better performance. If it’s worth the money, do it. As they say, “disk is the new tape.”
I think what Phil is trying to suggest is that, if putting a bunch of RAM in your server is going to cost you only $100-200, why _wouldn’t_ you start with that?
Sure, all things considered it would be better to have more efficient code. I don’t think he’s suggesting “Buy more RAM, full stop.” is the entire solution. (And it certainly sounds like there are some rookie inefficiencies in the application’s code.)
But if spending a couple bucks on the RAM takes a lot of the pain away _immediately_ it’s foolish to not do it, immediately.
Dan: Thanks for the “disk is the new tape” expression!
Regarding your other comments… I did not set out to prove that virtualization is without benefit for anyone on the planet. However, most Internet applications are RDBMS applications. I cannot see how putting additional layers of software between the RDBMS and the disk drives is going to improve RDBMS performance. Also, on a system with three virtual machines each supporting one application, I’m now incurring the cost of maintaining four copies of Linux rather than one. As a guy with very poor Unix sysadmin skills, that does not appeal to me.
Great post, thanks. Quite funny. I am a rails developer and I do see alot of people afraid of SQL. I don’t think the community slant towards MySQL helps.
I don’t have much to add except to clarify a small point. Someone above said that Mongrel (a rough equivalent of Tomcat) was not multi-threaded. Mongrel is multi threaded, Rails (until a very recent version I believe) is not. Having multiple application servers running is a way to dance around this problem. Because multi-threaded can be hard, Rails punted. On high traffic sites this can pose quite a challenge, but it also really can frame the database as the bottleneck because session information will be stored there so it can be shared across application servers.
Thanks for the thought-provoking article, philg. I love it.
I’m a big fan of Ruby on Rails and I’m really surprised by the negative reactions from the community. It seems to me your article aligns very well with the philosophy of RoR: keep it simple.
Philip,
One of the alternatives to ActiveRecord is Datamapper (mostly associated with Merb).
You’ll find more details here:
http://datamapper.org/doku.php?id=why_datamapper
With Rails 3.0 coming up, DataMapper may well get some more attention.
rgrds,
Johan
Hi Philip,
Great post. Some observations:
1) Programmers think modern “abstractions” protect them from having to worry about stuff like SQL, hardware, etc. Joel Spolsky of JoelonSoftware.com covered this very well in “Law of Leaky Abstractions” (http://www.joelonsoftware.com/articles/LeakyAbstractions.html)
2) At most large shops the “database guy” and “developers” are two different classes of citizen. “Developers” are “real programmers” whereas database guys…or database guys. The two don’t overlap. “Real programmers” don’t worry themselves with the database.
Philip-
I personally appreciate your caveman approaches. Your writings over the years about embracing simplicity in software and system design have been valuable.
It is extremely important to know what is going on under the hood of anything you do in code; the MIT Genius seems to have forgotten that. The virtues of the caveman approaches are that you know exactly what’s going: this query is being run, and over here I loop over the rows, fetch it into RAM, or whatever.
The danger, in my view, is that modern languages have first paved over memory-management concerns (garbage collection) and now database overhead (ORMs). Lo and behold, we now have a new swath of powerful tools, all ready to shoot ourselves in the foot. Your ASP users were fortunate that they had no such tools at their disposal.
The lesson seems to have been forgotten: adding abstractions to a system generally makes it harder to understand and fix performance problems, and often makes them easier to create. The defensiveness of the rails developers on this point is bizarre.
There is nothing wrong with Ruby (the language) or MySQL, but in my experience database abstraction layers like those bundled with Rails always bite you. If you must use one, get one that allows you to override with hand-tuned SQL when you need to.
As for RAM utilization, your data set may fit in 2.5GB today, and make an application that makes full table scans (due to poor coding or lack of indexes) look like it is performing acceptably. But once the data size hits 2.500001GB, performance is going to collapse. It would be worth doing some profiling to find out which pages cause the most problems. Obviously, if the site is failing and time is critical, upgrading RAM is a sensible band-aid, but you can’t stop there.
Regarding the Linux buffer cache problems, Linux’s VM subsystem has been unnecessarily rewritten so many times between 2.4 and 2.6 it’s not surprising even experienced sys admins could not figure it out. Solution: use a non-toy OS like Solaris or FreeBSD.
Finally, regarding Microsoft solutions – I have plenty of experience with Fortune 500 corporations who have Windows web hosting farms where IIS can’t even handle static file serving without clogging up with memory leaks, and our best Windows experts have absolutely no tools to figure out what the hell is going on. Solution: use the F5 load-balancers in front of the Windows farm to offload the said file serving to Apache running on Solaris machines with half the hardware resources. ASP.NET is fairly efficient, far more so than Java, but it is subject to sudden degradation in performance and once that happens you are in a hole it is very difficult to dig out of.
This is a very interesting discussion and I wanted to add my $0.02’s worth.
First off I believe in giving MySQL all the memory I can, the more memory the happier it will be, period. Memory is so cheap these days, not doing so makes no sense. It is preferable to limit the dataset size to fit in memory but that is not always possible, but at least make sure you can fit the indices in memory. Also it is best to allocate the memory to MySQL as opposed to the file cache for Linux. Bear in mind that there is overhead to MySQL, so for InnoDB a good rule of thumb is 3x overhead.
Also there is some patterns to avoid when you are trying to get the best performance out of MySQL such as avoiding joins, large selects, table scans, and unindexed selects, updates and deletes. Nothing out of the ordinary really.
I am usually against using Memcached for a few reasons. Why allocate memory to Memcached when you can allocate it to your DBMS. Using Memcached adds another level of management to the application which adds to the complexity (aging data, etc…) Memcached is usually trotted out to work around performance problems rather than fixing them because it is a quicker solution. Finally Memcached stores blobs of data which need to be serialized and deserialized. That being said, I would use Memcached where it makes sense.
I was not aware that Linux limited the amount of space allocated to caching files, I think that may have changed, my understanding is that current releases are very aggressive with caching, but I could be wrong.
Running Linux with no swap is fine, I know of some MySQL admin who do it. The only thing to be aware of is that Linux will kill off the largest user process if it runs out of memory, which on a machine running MySQL is usually MySQL.
One trick I have used in the past it to allocate as much shared memory as there is RAM in a machine and to copy files from the file system to the /dev/shm file system which linux tries to keep in RAM as much as possible. I wrote a post about this a while back: http://fschiettecatte.wordpress.com/2007/05/27/sharing-the-memories/
Very nice article. While I really like Rails, I utterly hate the “magic” because it provides too much abstraction away from what’s happening. If you ask me, Rails and Domain Specific Languages are both “leaky abstractions” and not in the good way – Ruby and Rails especially hide all the complex stuff behind a facade to make it look like English and use this as a selling point (“The code reads like prose!”) when in fact it’s one of the absolute worst things that can be done, because you get a lot of “cargo cult” programmers who follow along without ever really understanding what belongs_to or has_many is doing under the hood.
Hi Phil –
It looks like one of your former SEIA students, Dan Chak, just published a book on scaling Rails.
It’s called “Enterprise Rails,” and Hal Abelson reviewed it, saying, “Enterprise Rails is indispensable for anyone planning to build enterprise web services. It’s one thing to get your service off the ground with a framework like Rails, but quite another to construct a system that will hold up at enterprise scale. The secret is to make good architectural choices from the beginning. Chak shows you how to make those choices. Ignore his advice at your peril.”
http://www.amazon.com/Enterprise-Rails-Dan-Chak/dp/0596515200/
Chak comes from an ACS/AOLserver background and worked at Amazon, and I started using ACS/AOLserver back in the nineties so I was intrigued by his view on Rails.
All the best.
James
hi Philip,
You’re careful not to identify the credentialed programmer who perpetrated the mess, I’m curious about the ethics of that. At what point do you identify him and open him up to his richly-deserved public humiliation? People like the guy you describe give us all a bad name.
Nathan: The programmer never thanked me for pointing out that his world of slices did not have sufficient RAM, so maybe I should identify the ungrateful non-planner/non-documenter. However, I fear that his sins are all too common. He may not be Everyprogrammer, but his style of work certainly has plenty of company in the Web development world. Plus, how can I hate the guy who inspired http://philip.greenspun.com/software/design-review ?
If you have enough ram to hold all data, you should be much more stupid. What’s the point of having an RDBMS then? What you need is a transaction log and a memory dump. That’s called prevayler. No need to copy data from the database to the webapp and back again.
Stephan: Why not just have all of the threads/programs simultaneously update a big data structure in RAM, rather than use a DBMS? Occasionally computer programmers have been known to make mistakes, which results in bugs in software. If important data have been deleted or invalid data written, it is nice to be able to roll back part or all of the database to an earlier time. The SQL interface also means that it is easy to figure out which applications requested which changes to the database. Now that a standard-priced server can hold 256 GB of RAM, most transaction-processing databases would fit in RAM. So either Oracle is about to go out of business (current market cap around $150B) or corporate IT customers are paying for something that you haven’t considered.