Chapter 3: Scalable Systems for Online Communitiesby Philip Greenspun, part of Philip and Alex's Guide to Web Publishing
Revised June 2003
Our media does not portray the Michigan Militia (michiganmilitia.com) as a primarily educational institution. Yet a new member must learn where and when to meet, a body of Constitutional law, field communication skills, firearm safety, and marksmanship. To rise in the organization, a member must learn how to lead and educate other members.
Suppose that you decide to adopt a dog. You have to learn about the characteristics of different breeds. After choosing a breed, you have to learn about good breeders in your region. After choosing a puppy, you have to learn about training and learn about good vets in your city. You have to learn what brand of dog food is best and where to buy it. You have to learn where it is safe and legal to let your dog off the leash so that he can run and play with other dogs. Virtually all of this education will happen through informal contacts with more experienced dog owners, none of whom will set up a classroom or expect to be paid.
If you go to work as a computer programmer in a big company, the more experienced employees will have to show you where to find the water cooler, explain to you the significance of the project on which you're working, tell you how much of the work has been done so far, teach you how to use the software development tools, and demonstrate the fine points of the relational database management system on which the system you're building relies.
This definition embraces the traditional physical university. Professors and Ph.D. students work with undergraduates to help them learn enough to graduate. This definition is not large enough to embrace a physical small town or city neighborhood, which is what most people usually mean when they use the word. Newcomers to a residential community will need to learn how to get to the supermarket but otherwise are not likely to be pursuing a productive goal in common with other residents.
One answer is that not everyone can have the real thing. Many people wish to learn who cannot afford Ivy League tuition. Many people wish to learn who cannot afford to stop working for four years. Many people wish to learn whose responsibilities or disabilities prevent them from traveling to a university campus.
Companies can have pretty much whatever they want. Certainly they have plenty of money to build lavish offices. Yet isn't there something odd about a workaday world at the turn of the millennium that Bartleby the Scrivener (1853) would find utterly familiar? Workers come in from their homes each morning to settle into individual offices where they find the paper documents necessary for their work. With better technology and management techniques, perhaps it would be possible to benefit from contributions by part-time workers or workers who don't leave their houses. Perhaps projects could be finished sooner by workers cooperating in rooms devoted to the project rather than isolated in offices mostly devoted to storing paper documents from previously completed projects.
If you still feel that physical communities must always be superior to electronically linked communities, let me ask you to ponder three words: junior high school.
Junior high school throws people together who have nothing in common besides parents who chose to locate in a particular neighborhood. Unless you're very adaptable, it is tough to find good friends. High school is more or less the same idea, but the pool of people is usually larger so it is more probable that kids with uncommon interests will find soulmates. In college, not only is the pool larger but there can be a concentration by personality type. Nerds find each other at Caltech and MIT; hippies find each other at Bard and Reed; snobs find each other at Harvard and Princeton; skiers find each other at state schools in Colorado and Vermont. When students graduate and go to work, they usually don't make as many friends. They aren't meeting as many people and the common thread of "do not want to starve in street" doesn't tie them very tightly to other workers.
What we can infer from this is that people make the best friends when the pool is large and the interests are common. Enter the Internet, which affords instant communication among millions of people worldwide. It isn't possible to find a pool on a comparable scale except in the world's largest cities. Given the Internet's raw communication capability and huge pool of potential friends, if you want to make a really great friend you just need a means of finding someone who shares your interests and then a means of collaborating with him or her.
To summarize, here are the new things that we can do with online communities:
What do we conclude from these observations? Technology profoundly affects the type of community that can be sustained and the extent to which information flows from few to many or from many to many.
Site growth can outstrip the capacity of any person, no matter how dedicated or efficient.
One typical reaction on the part of the publisher is to turn the formerly non-commercial site into a showcase for whoredom. Users return to find six banner ads on the home page and links to kickback-paying referral partners obscuring content that had formerly been highlighted in relation to its utility. With all of the money flowing in, at least the publisher's scaling problems are history. More users means more page loads means more banner ads served means more revenue. The publisher can hire a discussion forum moderator and a customer service staff. Money can be used to hire writers and a webmaster to organize their contributions. The content may be bland and tainted by commercial interests, but at least the publisher is making a fat profit.... Oops! In practice, nearly all commercial community site publishers are losing money because the cost per user to maintain the site is too high.
Corporate intranet communities also need to scale. It really would be sad to have to hire a new moderator, webadmin, or sysadmin for every new employee. Yet the intranet community should be as vital as any public community site. If an employee sends another employee private e-mail asking a how-to question, that should be regarded as a failure of the intranet community software. Why wasn't it more efficient for these folks to collaborate using a Web service that would then archive the discussion?
Based on the author's experience with hundreds of Internet applications, it seems that all successful sites share the following six required elements for a sustainable online community:
Before any Web publisher contemplates running an online community, it is probably worth stepping back to ask which components of the software should be built, which developed cooperatively with other publishers, and which can and should be purchased off-the-shelf.
Bottom line: a fortune will be spent on programming; schedules will slip; the users will get reamed by all the bugs; if the site survives it will be so expensive to maintain that all the maintenance jobs will have to be exported to the Third World . Just as with any other custom software development effort.
Will this always be true? No. The basic argument of this chapter is that the Web server-side software industry's development has and will continue to recapitulate the development of the business data processing software industry.
In the 1970s, people would buy a commercial database management system plus some kind of iron on which to run it. They were tired of suffering with bugs in their programmers' ad hoc database management schemes and figured that their data storage needs weren't any different from hundreds of other computer users. Company programmers were used now only to write data models and application programs.
In the 1990s, people buy an enterprise software system such as SAP or Oracle Financials. They then buy an RDBMS to support it. They finally buy some iron to support the RDBMS. Company programmers are used only for some customization of the canned data models and apps.
Note that over these 40 years, there has been a huge transfer of power from iron vendors to data model/app vendors. Savvy companies realized this and adapted. IBM, for example, went heavily into the DBMS software business and then into business apps. On some days, you can go to the Oracle Web site and never learn that they make an RDBMS; the whole front page is given over to promoting the packaged business applications that they also sell. Not-so-savvy makers of iron or DBMS software have been nearly destroyed, e.g., Digital and Sybase.
Maybe it is a combination of all three. In any case, if your company handles payroll exactly the same way that Wombley's Widgets, Inc. does and they have a working program to do it, then you might as well use Wombley's software. If you hire programmers to build it from scratch, the best case is that you'll spend some money and get a working system. In the expected case, you'll spend a few years and millions of dollars working through all the bugs that Wombley Widgets worked through five years before. The worst (and surprisingly common) case, is that you'll spend ten years and tens of millions of dollars before having to scrap the whole project.
|Described by the engineers who built the software||Described by the marketing department|
|Here's a collection of hacks that we've assembled after building data processing systems for 15 companies. We're sorry that we never really finished it and that it doesn't do everything you need and that it will take 50 programmer-years to fill in the cracks and make it work for your business. But when you're done you'll probably have fewer bugs than if you'd started from scratch.||This is a comprehensive turnkey business data processing system already in use by 15 large and sophisticated companies. It does absolutely everything you need and is so flexibly designed that it will only take you 50 programmer-years to customize for your unique business practices|
At Oracle Open World in 1999, Larry Ellison asked the audience to imagine if they bought a car the way that smart business people buy computer systems:
"BMW has the best fuel injection so I'll get BMW fuel injection. I really like those big Jeep wheels so I'll get Jeep wheels. I like the Mercedes engine and I'll put it all into a Porsche body. I'll have the best car in the world because each component is best-of-breed.
"People buy cars from one company at a time. This is why cars are cheap and reliable. Computers were supposed to make people more productive but because of the way people buy software, our industry has created a worldwide labor shortage."
The alternative is to build an integrated system, running out of a single relational database. You might not get every last feature of every last best-of-breed application, but it won't cost $10 million of system integration time to ask simple questions across modules.
Suppose that the total space of world-wide information system needs includes 100,000 features. A 90% solution will do 90,000 of those right out of the box. Does that mean that 90% of the world's sites can be built without writing any custom code? Only if each site required just one feature. If a site requires 10 features, there is only a 35% chance that the system can be build without new programming. If a site requires 20 features, the chance drops to 12%. If the site requires 100 features, which is getting to be a typical commercial situation, there is only about a 3 in 10,000 chance that the publisher will get away without writing any code.
Let's make this concrete. We're building foobar.com, which requires 100 features. We have a choice of Toolkit40 or Toolkit90. With Toolkit90, we only expect to have to program 10 new features as extensions. With Toolkit40, we have to program 60 new features. So it should be six times as much work to use Toolkit40, no?
In software development, the vast majority of the code base is developed to accomplish the last few percent of the features. Thus, Toolkit40 will only have one-twentieth as many lines of code as Toolkit90. The programmers working from Toolkit40 will have to write six times as many features, but they will only have to read one-twentieth as much code. Furthermore, programming within a complex system may require more lines of code. So it is possible that the 10 new features built on top of Toolkit90 will contain more code than the 60 new features built on top of Toolkit40.
Another issue is that good programmers would rather write programs than read programs. So a project built on top of the lean, clean Toolkit40 will attract better people than a project built on top of bloated confusing Toolkit90. When you add in the fact that good programmers are 20 times more productive than average programmers, the probability of a success with Toolkit40 is much larger than the probability of success with Toolkit90.
What evidence is there of the truth of the foregoing? The ERP market contains products that aim to solve 100% of corporations' accounting problems. ERP software, of which SAP is the best-known example, comes very close to including 100% of the required features. Each company that adopts an ERP system only needs a handful of new features. Yet because of the complexity of the ERP toolkit as shipped, those features will take several years, 100 programmers, and $50 million to implement.
The following figures illustrate this point:
Figure 3-1: The outer rectangle contains the union of all the features that any oragnization might want from an information system. Individual features are represented by dots. An attempt to solve 100% of the problem and accomplish all of the desire features will, due to the nature of engineers, invariably yield at best a solution to 90% of the problems. The oval inside the rectangle shows the portion of the possible features accomplished by the software product.
Figure 3-2: Imagine a particular organization building an information system based on the 90%-solution toolkit. Most of what they want is indeed handled by the packaged software and only a little bit of custom programming need be done (hatched area). Unfortunately, the toolkit is so unwieldy due to its attempt to solve 100% of the problem that this little bit of coding takes years.
Figure 3-3: If you map the same organization's information system onto a less ambitious toolkit you can see that the amount of extension programming goes up considerably. What you don't see is that the total implementation effort may be much lower because the underlying toolkit is much simpler. There the programmers need spend much less time reading documentation, fitting their new software into the old, etc. Sometimes less is more.
If you want to build it yourself but with a little assistance on development and structure, consider working through the steps outlined in Internet Application Workbook (http://philip.greenspun.com/internet-application-workbook/), a textbook used at MIT by teams of students building online communities from scratch. Most of the teams choose to start with the Microsoft .NET tools.
If you want to find some open-source toolkits that can speed the development process, Microsoft distributes quite a few for the .NET environment. The toolkit that evolved from the old photo.net online community is available from www.openacs.org.
If you want to save yourself several hundred thousand dollars, see if an off-the-shelf multi-user server-based Weblog ("blog") application will solve your problems. A good example of this kind of product is Manila, available from manila.userland.com at a retail price of between $300 and $900. Manila is open-source and its behavior can be modified in a safe scripting language.
Given a Web site with 1000 static .html files, a discussion forum, and all the services and information above. An expert shows up at the site and begins to participate in the discussion forum and comments on some of the static pages. I want the software to automatically recognize that this person is an expert. If the expert asks "What 1001st static document can I write that will help the community the most?" I want the software to be able to suggest some topics.This example is as hard as the entire artificial intelligence problem and could occupy brilliant computer scientists for decades.
Brilliant computer scientists? The same ones that brought Microsoft Blue Screen of Death (TM) to your desktop and "server not responding" to your favorite ecommerce site? Or perhaps you'd rather trust the authors of the code that delayed the opening of Hong Kong's new $20 billion airport then crippled operations and left stranded passengers smelling dead fish and rotting fruit from the stalled cargo terminal.
On second thought maybe we should try to let the community users handle some of the programming themselves. Most of the Web technology that you can buy off-the-shelf presumes a mainframe-style "priesthood-that-develops-what-users-need" world, complete with the three-tiered architecture that shut down the Hong Kong airport. The best systems to support online communities, on the other hand, are built in such a way that genuinely hard things are left to a standard commercial relational database management system. Things that don't have to be hard are done in a safe interpreted computer language so that novice programmers running the community can modify and extend the software.
If we're smart enough to develop safe and effective languages, the power of programming need not be limited to the maintainers of a community Web site. The most useful and innovative services of all are often algorithms specified by users that run on the publisher's server, e.g., "send me mail every Monday and Thursday nights if there are any new articles by my friend Judy".
The programming chapters of this book illustrate the power and reliability of this software architecture for ecommerce and Web applications that replace desktop apps.
I want the Title here If i don't know HTML, how'd I add title or apply formatting to my comments?
-- Rajesh A. S. Pethe, October 8, 2004
I think it more important than ever to keep a tight rein on the features incorporated into an application.
I built a simple order processing system whose main purpose was to enable lots of new items to be quickly entered onto the system without first setting up stock parts. The system was developed with Delphi 4 and Access 97. It worked fine. Then along comes Delphi 2005 together with supped up 3rd part controls, Microsoft Sql Server Express, .net etc. All this newly available technolgy exerts a pull and you soon start asking yourself: How can this new technology be used to improve my existing application ? It is not difficult usually to find ways in which it might.
Sql Server is much more secure than Access 97 for networking. I have to admit but for pressure of other work I was sorely tempted to have go at a technological upgrade. Fortunately, I realised it be a major undertaking. As an after thought I wondered how the application features themselves might be improved on the current platform. I couldn't see much scope as it was a fairly bare bones system and I would want to leave any major new features until I upgraded and could take advantaged of new technology. But I was wrong.
For all the standard reasons, the idea was that orders would be entered sometime before the goods were delivered so that when they arrived they could be quickly checked in and we could check we only got what we had ordered. However, as the goods were always new to the organisation and were not fully described on the order forms, they could only be entered after they had been delivered and examined. Really, the order system wasn't being used as intended. However, it was still necessary to go through all the effort of entering orders before deliveries could be accepted. All we really needed was a system to record deliveries. At a stroke a large part of the system could be eliminated.
I'm sure the above small scale example must apply to many larger scale systems. I was interested to read that as recently as 2003 Zara (one of Europes most sucessful fashion chains ) was running its tills on the Dos operating system and moving data around terminals on floppy disc - see "Zara: IT for Fast Fashion" Harvard Business School 9-604-081. It makes you wonder if one of the most succesful companies can operate this way what the less succesful companies are doing with their networked computer systems. Have they really thought through their requirements or are the IT professionals effectively in charge and application uopgrade = techological upgrade ?
-- Andrew Johnson, November 25, 2005
All this is just as relevant today as when you wrote it. Many thanks for this stuff.
-- Neil Roberts, November 12, 2007