Using CVS for Web developmentby Philip Greenspun for Web Tools Review
If you have a very clear publishing objective, specs that never change, and one very smart developer, you don't need version control. If you have evolving objectives, changing specifications, and multiple contributors, you need version control.
What's wrong with the two-server plan? Nothing if you are running photo.net circa 1997. The development team consisted of me and Jin. The testing team... me and Jin! Note that there was no possibility of simultaneous development and testing. ArsDigita.com customers, however, usually have enough budget to pay for four or five programmers plus 20 or 30 internal staffers who may be updating content, testing changes, and sometimes contributing code. For a complex site, the publisher may wish to spend a week testing before launching a revision. It isn't acceptable to idle authors and developers while a handful of testers bangs away at the development server. The solution? A staging server, rooted at /web/foobar-staging/ (Server 3).
Here's how the three are used:
drop table usersrather than
drop table users_experimental_extra_tableto the database.
So it would seem that we'll need at least one new Oracle playground. Here are the steps:
The bottom line is that it takes work to keep three Oracle users' objects in sync. It is half as much work to sync two and almost as useful. How to deploy these two Oracle users? Park one behind the production server. Use the other one behind the dev and staging servers.
CVS does all of this via its repository or "CVS root". This is a directory, typically /usr/local/cvsroot/. Most Unix machines don't have enough space in the /usr partition to store all Web content. Remember that the CVS root will be at least as large as all of the files under source control. Thus we will use /cvsweb as our CVS root and, if need be, migrate it to a separate disk subsystem.
Create a project from your development Web sources (from /web/foobar-dev/) so that they will end up at /cvsweb/foobar/.
Who is really using CVS then? A cron job. Every day just before
midnight the cron job should check in all changes from the dev server to
the main branch, with the change comment 'nightly check-in YYYY-MM-DD'.
The cron job should notify the Release Master if any files that are in
the repository have been deleted so that he or she can decide whether
the removal was a mistake or if typing
cvs remove is
warranted (the files don't really go away; they go into an "attic").
One person is designated the Release Master. Normally this person does nothing. When the publisher is happy with the behavior of the development server, the Release Master creates a CVS branch named "199909Launch" or whatever. The Release Master updates the staging server from CVS with this branch. Development proceeds with checkins to the main CVS branch. These won't affect updates from the 199909Launch branch.
Once the staging server has been thoroughly tested, the Release Master checks in any changes that have been made. The check-in happens twice, once to the 199909Launch branch (there won't be any conflicts since nobody has been touching this) and once to the main branch (conflicts may need to be resolved).
When the publisher decides to go live, the Release Master takes the following steps:
If the Release Master is doing all of this hard work, why do we need to train anyone else in CVS? A Web service is 24x7 but one person can't work 24x7. So we need a Release Apprentice for each Web service who knows everything that there is to know about this system.
The ArsDigita Community System generally contains the following under /web/foobar:
If you're worried about your developers being sloppy and editing files
in /web/foobar/ when they thought they were in /web/foobar-dev/ remember
that you can always use
cvs update to revert the production
site to the most recent approved version.
Suppose that you've ample money for server hardware, co-location fees, and sysadmin resources. You probably want to split the production machine out and only give the Release Master and Release Apprentice access to that box. Let the developers and staging/testing folks fight it out on a development server.
Compare this to the world of db-backed Web servers. If you want to check out a copy of the tree and play with it, you have to create an Oracle user and tablespace, import a recent Oracle export.dmp file to populate your tablespace with what was on the production site, find a free IP address or port and set up a Web server, and then keep your Oracle table definitions in sync with any alterations other developers may be making.
In the C world, developers live to satisfy themselves. More than likely, not another soul on the planet will ever run the code that they are authoring. So it is fine for them to work alone. In the Web world, developers always work with the publisher and users. Those collaborators will need to be alerted to this new server so that they can offer criticism and advice. They might need special passwords or firewall access since most publishers don't like to let the public see their unfinished development efforts.
In the C world, you've got the luxury of one or two years between product releases. All the work is done by people with at least four years of training. In the Web world, a significant new release may need to be produced in four weeks. Much of the work may be done by people with no formal training of any kind, e.g., designers and content authors editing templates or static .html pages. Given the chronic shortage of personnel in this industry, do you want to limit yourself to being able to hire only those who've been through a CVS training course? To those who are formally minded enough to read the CVS man pages? Remember that most of the contributors on your site will not be programmers.
The bottom line? It is just too much work to set up each contributor with his or her own little server.
If you are setting up a new cvs server, spend a few extra minutes to configure CVS using the client-server ("pserver") mode, instead of the older file system mode. This will save you pain later and may keep you out of hot water. Pain, because moving the repository (your old one dies, your company IPO's and your boss wants to buy a big fancy server farm, you want to hide the repository behind a firewall) is matter of changing an environment variable. You get immedieate access control (developers can be protected from updating the production environment). CVS in file system mode can "hang" because it leaves a lock file around for each file and directory. Then you need a cvs guru to dive in and fix it. One note: you can't live in a mixed environment. It is either one mode or the other.
An expert tip on using client server: CVS uses gzip for compressing data across the network. The default setting is -z3 which is a pitiful waste of time. Recompile CVS to use -z9 by default (the network is the bottleneck, not CPU resources), or add it to everyone's .cvsrc configuration file (it lives in the users' home directory).
I've had some extremely painful experiences with CVS and large binary files. (Large is +32Mb) When CVS checks a file out of the repository, even if it is doing nothing more than a straight copy (no diff'ing, merging, etc.) the program brings the whole file into contiguous memory. This bloats the CVS process resident set size to at least the size of the file, +6Mb for the program, give or take. The process is inefficient, so subsequent large files don't reusue the space well. CVS bloats even more. Make sure that your server is configured with a lot of swap space (it should have a lot of memory anyway). Even so, performance will drag down into the ground until CVS is finished (could be 30 minutes for a large working set), then things will "mysteriously" return to normal.
-- Ken Mayer, July 23, 1999
Your proposed once per day automatic check-in of everything is a nice idea for a group such as your ArsDigita companay with it's fairly non standard mission statement.
In more mundane companies however you usually have at least one mid-lewel manager who will see the amount of code checked-in every day as a measurement of individual emplye efficiency, and wrech all sorts of havoc with this misguided "knowledge".
I'm sure some of you have expierenced mid-level managers who were too dump to even figure out how to do this, but I have never been that unlucky ;-)
Apart from this your proposed method sounds remarkely similar to what I have been doing for various db backed websites over the last few years. It has proven itself to me to be a great time saver and I don't even want to calculate how many near disasters with their associated all night fix-up sessions it has saved me or my co-workers from.
The pserver is surely the only way to share CVS among a group of people without running into all sorts of non-interesting problems with nfs etc. You can also tunnel it through ssh for secure over-the-net operations.
-- Kristian Elof Sxrensen, July 24, 1999
Regarding putting the stuff in /parameters - the .ini files - under CVS, and requiring different .ini files for your three servers: this is a darn good reason to use Tcl configuration files in AOLserver 3.0 instead of .ini files. Then config file can use Tcl to determine whether it's a production, dev, or staging server (based on an environment variable, or the server home, etc.), and use the appropriate config values.
-- Rob Mayoff, February 26, 2000
Although my company does not use CVS, we have used Microsoft's Visual Source Safe and Intersolv's PVCS Version Manager. Both were a pain to setup and have people use them. All complains usually go away after the first time that version control saves your day after some screwup.
As for the managers, they usually don't care. Obviously some misguided soul is going to use this tool to gather information on who worked on what and for how long, but around the office 99% of the people are interested in it because it saves us from many headaches.
I don't think I ever want to work on a project without some kind of version control.
-- Pedro Vera-Perez, March 14, 2000
When dealing with large teams of developers using CVS can be a real headache. One alternative would be BitKeeper which solves most (if not all) CVS's problems. It was written by the guys that did SUN's TeamWare's source management system.
-- Petru Paler, April 18, 2000
I know the above is 7 years old, and version control has been gaining acceptance all the time, so what I want to add may already be obvious to most readers.
First, everything I do that is worth saving is under version control, either in CVS, or (preferably) in Subversion, which is short for "CVS with the glaring problems fixed". A version control server is best regarded as part of the regular IT infrastructure, like a file server, a webserver or a mail server. The university department where I work (Math & CS) maintains a Subversion server for all employees to use; it's very popular and works very well. Used mainly for source code, websites, and scientific papers.
Second, version control can be thought of as a tool for collaboration, but I use it more for structuring my own work. I commit my changes into version control whenever they represent some meaningful unit of change: not sooner, not later. So my commits usually correspond to specific tasks, with specific objectives that the changes were designed to achieve, and specific results - objectives met, objectives failed, new issues found. To get a task-oriented overview of the work I did on a project in the past year, I just read back the log of messages I typed with each commit. To refresh my memory on why a particular change was made, I look it up in version control and read the accompanying commit message.
It also works the other way: I can set objectives and start making the necessary changes until I have acceptable results, knowing that if half way through things turn out uglier than expected, I'll just revert to the last committed version and start over, or postpone the objective in question.
So version control really helps me structure my work. It structures my work into transactions. This benefit is lost with automated commits. The one conceptual hurdle to learning how to work with CVS or Subversion is that users must learn to think of all their edits as being parts of transactions that need to be explicitly committed or rolled back. Once they learn this it can be a big asset.
-- Reinier Post, June 17, 2006