Chapter 10: Sites That Are Really Programsby Philip Greenspun, part of Philip and Alex's Guide to Web PublishingRevised July 2003 |
You needn't turn your Web site into a program just because the body of material that you are publishing is changing. Sites such as http://dir.yahoo.com, for example, are sets of static files that are periodically generated by programs grinding through a dynamic database. With this sort of arrangement, the site inevitably lags behind the database but you can handle millions of requests a day without a major investment in computer hardware, custom software, or thought.
If you want to make a collaborative site, however, at least some of your Web pages will have to be computer programs. Pages that process user submissions have to add user-supplied data to your Web server's disk. Pages that display user submissions have to look through a database on your server before delivering the relevant contributions.
In older times, if you wanted to publish completely static, non-collaborative material, at least one portion of your site would require server-side programming: the search engine. To provide a full-text search over the material on a site, the server would have to take a query string from the user, compare it to the files on the disk, and then return a page of links to relevant documents. Nowadays most people who wanted to do this would instead build a form targeting Google with hidden form variables that will restrict the search to the originating site's domain (see http://www.google.com/searchcode.html to set this up).
This chapter discusses the options available to Web publishers who need to write program-backed pages. Here are the steps:
Every interesting Web site has some characteristics of both a document and a computer program. There is thus no correct answer to the question "Is your site a hypertext document with bits of computation or a computer program with bits of static text?" However, the tools that make it easy for a team of experts to develop a computer program will get in the way if your site is fundamentally a document. Conversely, the tools that make it convenient to edit a document can lead to sloppy and error-filled computer programs.
Server-side programming systems that take the document model to its
logical extreme is Microsoft Active Server Pages (ASP). A vanilla HTML
file is a legal ASP document. If you want to add some computation, you
weave in little computer language fragments, surrounded by <%
... %>
. If you want to fix a typo or a programming bug, you
edit the .adp or .asp file and hit reload in your Web browser to see the
new version. Almost always, the connection is direct and immediate
between the URL where the problem was observed and the file on the
server that you must edit. You don't have to understand much of the
document's structure to fix a bug.
At the other end of the document/program spectrum are various "application servers" that require you to program in C or Java. HTML text is inevitably buried inside these programs. Fixing a typo requires editing the program, compiling the program, and reloading the compiled code into the Web or application server. If there is a problem with a URL, fixing it might require reading and editing dozens of program files and understanding most of the program's overall structure.
With the right tools and programmer resources, you can build a jewel-like software system to sit behind a Web site. But ask yourself whether the entire service isn't likely to be redesigned after six months, and if, realistically, your site isn't going to be thrown together hastily by overworked programmers. If so, perhaps it will be best to look for the tightest development cycle.
Consider these aspects:
"PHP is a pet peeve of mine. They spent ungodly hours inflicting their
own new scripting language on the world that is almost exactly like
Perl."
-- a programmer friend who has suffered through 10 years of gratuitious changes in open-source Web development tools |
How could a lame string-oriented scripting language possibly compete in power with systems programming languages? Well, guess what? The only data type that you can write to a Web browser is a string. And all the information from the relational database management system on which you are relying comes back to to the Web server program as strings. So maybe it doesn't matter whether your scripting language has an enfeebled type system.
Are these languages really the best? Computer scientists can't believe that a scripting language could be as good as Lisp and better than Java for developing Internet applications. But it turns out to be almost true. A scripting language is better than Java because a scripting langauge doesn't have to be compiled. A scripting language can be better than Lisp because string manipulation is simpler. For example, in Tcl or Perl
will generate a string from the fragments of static ASCII above plus the contents of the variables"posted by $email on $posting_date."
$email
and
$posting_date
. These were presumably recently pulled from
a relational database. The result might look something like
In Common Lisp, you'd have"posted by philg@mit.edu on February 15, 1998."
which uses a fabulously general mechanism for concatenating sequences.(concatenate 'string "posted by " email " on " posting-date ".")
concatenate
can work on sequences of ASCII characters
(strings) or sequences of TCP packets or sequences of three-dimensional
arrays or sequences of double-precision complex numbers. Sequences can
either be lists (fast to modify) or vectors (fast to retrieve). This
kind of flexibility, which Java apes, is wonderful except that Web
programmers are concatenating strings 99.99 percent of the time and the
scripting languages' syntactic shortcuts make code easier to read and
more reliable.
If your source of persistent storage were an object database, which can directly represent complex types, a language such as C#, Common Lisp, or Java would be very useful for writing individual Web pages. But in our current world, which is overwhelming dominating by the relational database management system and its three types (string, number, and date), these languages add very little power.
Finally, don't forget that even if you're developing individual pages in a scripting language you can write substrate programs in C#, Java, PL/SQL and other more complex languages. If you're using the Microsoft .NET environment methods of C# classes can be invoked by any VB.NET program. If you want to do sophisticated computation on information that comes from the relational database, typically that can be done by a Java or PL/SQL program running inside Oracle or by a C# program running inside Microsoft SQL Server.
The oldest mechanism for program invocation via the Web is the Common-Gateway Interface (CGI). The CGI standard is an abstraction barrier that dictates what a program should expect from the Web server, for example, user form input, and how the program must return characters to the Web server program for them to eventually be written back to the Web user. If you write a program with the CGI standard in mind, it will work with any Web server program. You can move your site from Apache to Microsoft Internet Information System (IIS) and all of your CGI scripts will still work. You can give your programs away to other Web publishers who aren't running the same server program. Of course if you wrote your CGI program in C and compiled it for a Linux box, it might not run so great on a Windows XP machine.
Oops.
We've just discovered why most CGI scripts are written in Perl, PHP, Tcl, or some other interpreted computer language. The systems administrator can install the Perl or Tcl interpreter once and then Web site developers on that machine can easily run any script that they download from another site.
Fixing a bug in an interpreted CGI script is easy. A message shows up
in the error log when a user accesses
"http://yourserver.nerdu.edu/bboard/subject-lines.pl". If
your Web server document root is at /web
, then you know to edit the file
/web/bboard/subject-lines.pl
. After you've found the bug and written the
file back to the disk, the next time the page is accessed the new
version of the subject-lines Perl script will be interpreted.
For concreteness, let's summarize Unix CGI:
This example program will print "Hello World" as a level-3 headline. If you want to get more sophisticated, read some on-line tutorials or CGI Programming with Perl (Birznieks et al; O'Reilly 2000).#!/usr/contrib/bin/perl # the first line in a Unix shell script says where to find the # interpreter. If you don't know where perl lives on your system, type # "which perl", "type perl", or "whereis perl" at any shell # and put the result after the #! print "Content-type: text/html\n\n"; # now we have printed a header (plus two newlines) indicating that the # document will be HTML; whatever else we write to standard output will # show up on the user's screen print "<h3>Hello World</h3>";
It is that easy to write Perl CGI scripts and get server independence, a tight software development cycle, and ease of distribution to other sites. With that in mind, you might ask how many of the thousands of dynamic Web pages on popular Web sites use this program invocation mechanism. The answer? Maybe a couple.
All Web server APIs allow you to specify "If the user makes a request for a URL that starts with /foo/bar/ then run Program X". The really good Web server APIs allow you to request program invocation before or after pages are delivered. For example, you ought to be able to say "When the user makes a request for any HTML file, run Program Y first and don't serve the file if Program Y says it is unhappy". Or "After the user has been served any file from the /car-reviews directory, run Program Z" (presumably Program Z performs some kind of logging).
Sometime in mid-1994 the researchers depending on Martigny, whose load average had soared from 0.2 to 3.5, decided that a 100,000 hit per day Web site was something that might very nicely be hosted elsewhere. It was easy enough to find a neglected HP Unix box, which we called swissnet.ai.mit.edu. And we sort of learned our lesson and did not distribute this new name in the URL but rather aliases: "www-swiss.ai.mit.edu" for research publications of our group (known as "Switzerland" for obscure reasons); "photo.net" for photo stuff; "pgp.ai.mit.edu" for Brian's public key server.
But what were we to do with all the hard-wired links out there to martigny.ai.mit.edu? We left NCSA 1.3 loaded on Martigny but changed the configuration files so that a request for "http://martigny.ai.mit.edu/foo/bar.html" would result in a 302 redirect being returned to the user's browser so that it would instead fetch http://www-swiss.ai.mit.edu/foo/bar.html.
Two years later, in August 1996, someone upgraded Martigny from HP-UX 9 to HP-UX 10. Nobody bothered to install a Web server on the machine. Email began to trickle in "I searched for you on the Web but your server has been down since last Thursday." Eventually we figured out that the search engines were still sending people to Martigny, a machine that was in no danger of ever responding to a Web request since it no longer ran any program listening to port 80.
Those were the early days of Apache and we couldn't get it to compile. We downloaded an expensive commercial Web server made by a now-defunct company called "Netscape". It was a $5000 product, free to universities, with a built-in redirect facility but sadly there was a bug in the program and the redirects didn't work. Because the product was closed-source we couldn't fix it ourselves. Finally we installed AOLserver, which did not have a neat redirect facility, but its Tcl API seemed flexible enough that it would be possible to make the server do whatever we wanted.
First, we tell AOLserver to feed all requests to a Tcl procedure instead of looking around in the file system:
ns_register_proc GET / martigny_redirect
This is a Tcl function call. The function being called is named
ns_register_proc
. Any function that begins with
"ns_" is part of the NaviServer Tcl API (NaviServer was the
name of the program before AOL bought NaviSoft in
1995). ns_register_proc
takes three arguments: method, URL,
and procname. In this case, the code says that HTTP GETs for the URL
"/" (and below) are to be handled by the Tcl procedure
martigny_redirect
:
proc martigny_redirect {} { append url_on_swissnet "http://www-swiss.ai.mit.edu" [ns_conn url] ns_returnredirect $url_on_swissnet }
This is a Tcl procedure definition, which has the form "proc
procedure-name arguments body"
. martigny_redirect
takes no
arguments. When martigny_redirect
is
invoked, it first computes the full URL of the corresponding file on
Swissnet. The meat of this computation is a call to the API procedure
"ns_conn" asking for the URL that was part of the request
line.
With the full URL computed, martigny_redirect
's second body
line calls the API procedure ns_returnredirect
. This writes
back to the connection a set of 302 redirect headers instructing the
browser to rerequest the file, this time from
"http://www-swiss.ai.mit.edu".
# tell AOLserver to watch for PDF file requests under the /ejournal directory # if we don't add additional ns_register_filter commands, all the # other files will be available to everyone ns_register_filter preauth GET /ejournal/*.pdf ejournal_check_auth proc ejournal_check_auth {args why} { # all the parameters we might want to change set user "open" set passwd "sesame" # on the real-life server, these are pulled from a relational database # but here for an example, let's just set it to MIT and Stanford set allowed_ip_ranges [list "18.*" "36.*"] foreach pattern $allowed_ip_ranges { if { [string match $pattern [ns_conn peeraddr]] } { # a paying customer; the file will be sent return "filter_ok" } } # not coming from a special IP address, let's check the # username and password headers that came with the request if { [ns_conn authuser] == $user && [ns_conn authpassword] == $passwd } { # they are an authorized user; the file will be sent return "filter_ok" } # not a good IP address, no headers, hammer them with a 401 demand ns_set put [ns_conn outputheaders] WWW-Authenticate "Basic realm=\"MIT Press:Restricted\"" ns_returnfile 401 text/html "[ns_info pageroot]ejournal/please-subscribe.html" # stop AOLserver from handling the request by returning a special code return "filter_return" }
"For me grad school is fun just like playing Tetris all night is fun. In the morning you realize that it was sort of enjoyable, but it didn't get you anywhere and it left you very very tired."Computer science graduate students earn a monthly stipend that wouldn't cover the average yuppie's SUV payments and gas bill. If you've been reading Albert Camus lately ("It is a kind of spiritual snobbery to think one can be happy without money") then you'd expect this to lead to occasional depression. For these depressed souls, there is Career Guide for Engineers and Scientists (http://philip.greenspun.com/careers/).
-- Michael Booth's comment on the philip.greenspun.com "Women in Computing" page
Thought 1: starving graduate students forgoing six years of income would be cheered to read the National Science Foundation report that "Median real earnings remained essentially flat for all major non-academic science and engineering occupations from 1979-1989. This trend was not mirrored among the overall work force where median income for all employed persons with a bachelor's degree or higher rose 27.5 percent from 1979-1989 (to a median salary of $28,000)."
Thought 2: custom photography would help get the message across (see ).
Thouht 3: we could really get under the skin of America's best and brightest young computer scientists with Aid to Evaluating Your Accomplishments (see ).
Here's the source code:
# a helper procedure to pick N items randomly from a list # note that it uses tail-recursion, importing a little bit # of the clean Scheme philosophy into the ugly world of Tcl proc choose_n_random {choices_list n_to_choose chosen_list} { if { $n_to_choose == 0 } { return $chosen_list } else { set chosen_index [randomRange [llength $choices_list]] set new_chosen_list [lappend chosen_list [lindex $choices_list $chosen_index]] set new_n_to_choose [expr $n_to_choose - 1] set new_choices_list [lreplace $choices_list $chosen_index $chosen_index] return [choose_n_random $new_choices_list $new_n_to_choose $new_chosen_list] } } # we encapsulate the printing of an individual person so that # one day we can easily change the design of the page (we display # four people at once and putting this in a procedure keeps us from # having to edit the same code four times). proc one_person {person} { set name [lindex $person 0] set title [lindex $person 1] set achievement [lindex $person 2] return "<h4>$title $name</h4>\n $achievement <br><br> <center> (<a href=\"http://altavista.digital.com/cgi-bin/query?pg=q&what=web&fmt=&q=[ns_urlencode $name]\">more</a>) </center>\n" } # we return HTTP headers to the client ReturnHeaders # we return as much of the page as we can before figuring out which four # people we're going to display; this way if we were going to query a # relational database (potentially taking 1/2 second), the user would # have something on-screen to read ns_write "<html> <head> <title>Aid to Evaluating Your Accomplishments</title> </head> <body bgcolor=#ffffff text=#000000> <h2>Aid to Evaluating Your Accomplishments</h2> part of <a href=\"/philg/careers.html\">Career Guide for Engineers and Scientists</a> <hr> Compare yourself to these four ordinary people who were selected at random: <br> <br> " # each person is name, title, accomplishment(s) set einstein [list "A. Einstein" "Patent Office Clerk" \ "Formulated Theory of Relativity."] set mill [list "John Stuart Mill" "English Youth" \ "Was able to read Greek and Latin at age 3."] set mozart [list "W. A. Mozart" "Viennese Pauper" \ "Composed his first opera, <i>La finta semplice</i>, at the age of 12."] set jesus [list "Jesus of Nazareth" "Judean Carpenter" \ "Told young women he was God and they believed him."] set stevens [list "Wallace Stevens" "Hartford Connecticut Insurance Executive" "Won Pulitzer Prize for Poetry in 1954; best known for \"Thirteen Ways of Looking at a Blackbird\"."] # ... there are a bunch more in the real live script set average_folks [list $einstein $mill $mozart $jesus] # we call our choose_n_random procedure, note that we give it an empty # list to kick off the tail-recursion set four_average_folks [choose_n_random $average_folks 4 [list]] ns_write $conn "<table cellpadding=20> <tr> <td valign=top> [one_person [lindex $four_average_folks 0]] </td> <td valign=top> [one_person [lindex $four_average_folks 1]] </td> </tr> <tr> <td valign=top> [one_person [lindex $four_average_folks 2]] </td> <td valign=top> [one_person [lindex $four_average_folks 3]] </td> </tr> </table> " # note how in the big block of static HTML below, we're forced to # put backslashes in front of the string quotes. This is annoying # and we wouldn't have to do it if we'd implemented this using # AOLserver Dynamic Pages (where the text is HTML by default, # Tcl code by exception). ns_write $conn " <p> Programmed by <a href=\"http://www.ugcs.caltech.edu/~eveander/\">Eve Astrid Andersson</a> and <a href=\"/philg/\">Philip Greenspun</a> in <a href=\"/wtr/servers.html#naviserver\">AOLserver Tcl</a>. If you're a nerd, you might find <a href=\"four-random-people.txt\">the source code</a> useful. <P> Original Inspiration: <cite>How to Make Yourself Miserable</cite>, by Dan Greenburg <hr> <a href=\"/philg/\"><address>philg@mit.edu</address></a> </body> </html> "
The forms user interface model fell into the shade after 1984 when the Macintosh "user drives" pull-down menu system was introduced. However, HTML forms as classically conceived work exactly like the good old 3270. Here's an example that is firmly in the 3270 mold, taken from the Lens chapter of my photography tutorial textbook (http://www.photo.net/making-photographs/lens). The basic idea is to help people figure out what size lens they will need to buy or rent in order to make a particular image. They fill in a form with distance to subject and the height of their subject (see ). The server then tells them what focal length lens they need for a 35mm camera.
Here's the HTML source for the form:
<form method=post action=focal-length.tcl> How far away is your subject? <input type=text name=distance_in_feet size=7> (in feet) <p> How high is the object you want to fill the frame? <input type=text name=subject_size_in_feet size=7> (in feet) <p> <input type=submit> </form>
Here's the AOLserver Tcl program that processes the user input:
set_form_variables # distance_in_feet, subject_size_in_feet are the args from the form # they are now set in Tcl local variables thanks to the magic # utility function call above # let's do a little IBM mainframe-style error-checking here if { ![info exists distance_in_feet] || [string compare $distance_in_feet ""] == 0 } { ns_return 200 text/plain "Please fill in the \"distance to subject\" field" # stop the execution of this script return } if { ![info exists subject_size_in_feet] || [string compare $subject_size_in_feet ""] == 0 } { ns_return 200 text/plain "Please fill in the \"subject size\" field" # stop the execution of this script return } # we presume that subject is to fill a 1.5 inch long-dimension of a # 35mm negative # ahhh... the joys of arithmetic in Tcl, a quality language so # much cleaner than Lisp set distance_in_inches [expr $distance_in_feet * 12] set subject_size_in_inches [expr $subject_size_in_feet * 12] set magnification [expr 1.5 / $subject_size_in_inches] set lens_focal_length_inches [expr $distance_in_inches / ((1/$magnification) + 1)] set lens_focal_length_mm [expr round($lens_focal_length_inches * 25.4)] # now we return a page to the user, one big string into which we let Tcl # interpolate some variable values ns_return $conn 200 text/html "<html> <head> <title>You need $lens_focal_length_mm mm </title> </head> <body bgcolor=#ffffff text=#000000> <table> <tr> <td> <a href=\"/images/pcd0952/boston-marathon-46.tcl\"><img HEIGHT=198 WIDTH=132 src=\"/images/pcd0952/boston-marathon-46.1.jpg\" ALT=\"100th Anniversary Boston Marathon (1996).\"></a> <td> <h2>$lens_focal_length_mm millimeters</h2> will do the job on a Nikon or Canon or similar 35mm camera <P> (according to the <a href=\"http://www.photo.net/photo/tutorial/lens.html\">photo.net lens tutorial</a> calculator) </tr> </table> <hr> Here are the raw numbers: <ul> <li>distance to your subject: $distance_in_feet feet ($distance_in_inches inches) <li>long dimension of your subject: $subject_size_in_feet feet ($subject_size_in_inches inches) <li>magnification: $magnification <li>lens size required: $lens_focal_length_inches inches ($lens_focal_length_mm mm) </ul> Assumptions: You are using a standard 35mm frame (24x36mm) whose long dimension is about 1.5 inches. You are holding the camera in portrait mode so that your subject is filling the long side of the frame. You are supposed to measure subject distance from the optical midpoint of the lens, which for a normal lens is roughly at the physical midpoint. <P> Source of formula: <a href=\"http://www.photo.net/photo/dead-trees/professional-photoguide.html\">Kodak Professional Photoguide</a> <br> Source of server-side programming knowledge: Chapter 9 of <a href=\"http://www.photo.net/wtr/dead-trees/\">How to be a Web Whore Just Like Me</a> <br> Time required to write this program: 15 minutes. <br> Proof that philg is a nerd: <a href=\"focal-length.txt\">view the source code</a> <br> What this is not: a slow Java program that will crash everyone's browser (except those behind corporate firewalls that block all Java applets) <br> Another thing this is not: a CGI program that will make my poor old Unix box fork <br> Yet another thing this is not: a JavaScript program that you'd think would be the right thing but then on the other hand it wouldn't work with some browsers and the last thing that I need is email from confused users <h3>Bored? Try again</h3> <form method=post action=focal-length.tcl> How far away is your subject? <input type=text name=distance_in_feet size=7 value=\"$distance_in_feet\"> (in feet) <p> How high is the object you want to fill the frame? <input type=text name=subject_size_in_feet size=7 value=\"$subject_size_in_feet\"> (in feet) <p> <input type=submit> </form> <h3>European? Macro-oriented?</h3> <form method=post action=focal-length-mm.tcl> How far away is your subject? <input type=text name=distance_in_mm size=7> (in millimeters) <p> How high is the object you want to fill the frame? <input type=text name=subject_size_in_mm size=7> (in millimeters) <p> <input type=submit> </form> <hr> <a href=\"/philg/\"><address>philg@mit.edu</address></a> </body> </html>"
Yes, dead trees.
If you aren't in a refereed journal or conference, you aren't going to get tenure. You can't expect to achieve quality without peer review. And peer review isn't just a positive feedback mechanism to enshrine mediocrity. It keeps uninteresting papers from distracting serious thinkers at important conferences. For example, there was this guy in a physics lab in Switzerland, Tim Berners-Lee. And he wrote a paper about distributing hypertext documents over the Internet. Something he called "the Web". Fortunately for the integrity of academia, this paper was rejected from conferences where people were discussing truly serious hypertext systems.
Anyway, with foresight like this, it is only natural that academics like to throw stones at successful unworthies in the commercial arena. The "Why Bill Gates is Richer than You" section on philip.greenspun.com didn't come into its own until the day Brian announced to our little research group at MIT that the U.S. Census Bureau had put up a real-time population clock at http://www.census.gov/cgi-bin/popclock. There had been stock quote servers on the Web almost since Day 1. How hard could it be to write a program that would reach out into the Web and grab the Microsoft stock price and the population, then do the math to come up with what you see at http://philip.greenspun.com/WealthClock (see ).
This program was easy to write because the AOLserver Tcl API contains
the ns_httpget
procedure. Having a personal server grab a
page from the Census Bureau is as easy as
ns_httpget "http://www.census.gov/cgi-bin/popclock"
Tcl the language made life easy because of its built-in regular expression matcher. The Census Bureau and the Security APL stock quote folks did not intend for their pages to be machine-parsable. Yet only a short program was necessary to pull the raw numbers out of a page designed for reading by humans.
Anyway, here is the code. Look at the comments.
# this program copyright 1996, 1997 Philip Greenspun (philg@mit.edu)
# redistribution and reuse permitted under
# the standard GNU license
# this function turns "99 1/8" into "99.125"
proc wealth_RawQuoteToDecimal {raw_quote} {
if { [regexp {(.*) (.*)} $raw_quote match whole fraction] } {
# there was a space
if { [regexp {(.*)/(.*)} $fraction match num denom] } {
# there was a "/"
set extra [expr double($num) / $denom]
return [expr $whole + $extra]
}
# we couldn't parse the fraction
return $whole
} else {
# we couldn't find a space, assume integer
return $raw_quote
}
}
###
# done defining helpers, here's the meat of the page
###
# grab the stock quote and stuff it into QUOTE_HTML
set quote_html [ns_httpget "http://qs.secapl.com/cgi-bin/qs?ticks=MSFT"]
# regexp into the returned page to get the raw_quote out
regexp {Last Traded at</a></td><td align=right><strong>([^A-z]*)</strong>} \
$quote_html match raw_quote
# convert whole number + fraction, e.g., "99 1/8" into decimal,
# e.g., "99.125"
set msft_stock_price [wealth_RawQuoteToDecimal $raw_quote]
set population_html [ns_httpget "http://www.census.gov/cgi-bin/popclock"]
# we have to find the population in the HTML and then split it up
# by taking out the commas
regexp {<H1>[^0-9]*([0-9]+),([0-9]+),([0-9]+).*</H1>} \
$population_html match millions thousands units
# we have to trim the leading zeros because Tcl has such a
# brain damaged model of numbers and thinks "039" is octal
# this is when you kick yourself for not using Common Lisp
set trimmed_millions [string trimleft $millions 0]
set trimmed_thousands [string trimleft $thousands 0]
set trimmed_units [string trimleft $units 0]
# then we add them back together for computation
set population [expr ($trimmed_millions * 1000000) + \
($trimmed_thousands * 1000) + \
$trimmed_units]
# and reassemble them in a string for display
set pretty_population "$millions,$thousands,$units"
# Tcl is NOT Lisp and therefore if the stock price and shares are
# both integers, you get silent overflow (because the result is too
# large to represent in a 32 bit integer) and Bill Gates comes out as a
# pauper (< $1 billion). We hammer the problem by converting to double
# precision floating point right here.
#
# (Were we using Common Lisp, the result of multiplying two big 32-bit
# integers would be a "big num", an integer represented with multiple
# words of memory; Common Lisp programs perform arithmetic correctly.
# The time taken to compute a result may change when you move from a
# 32-bit to a 64-bit computer but the result itself won't change.)
set gates_shares_pre_split [expr double(141159990)]
set gates_shares [expr $gates_shares_pre_split * 2]
set gates_wealth [expr $gates_shares * $msft_stock_price]
set gates_wealth_billions \
[string trim [format "%10.6f" [expr $gates_wealth / 1.0e9]]]
set personal_share [expr $gates_wealth / $population]
set pretty_date [exec /usr/local/bin/date]
# we're done figuring, now let's return a page to the user
ns_return 200 text/html "<html>
<head>
<title>Bill Gates Personal Wealth Clock</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h2>Bill Gates Personal Wealth Clock</h2>
just a small portion of
<a href=\"http://www-swiss.ai.mit.edu/philg/humor/bill-gates.html\">Why Bill Gates is Richer than You
</a>
by
<a href=\"http://www-swiss.ai.mit.edu/philg/\">Philip Greenspun</a>
<hr>
<center>
<br>
<br>
<table>
<tr><th colspan=2 align=center>$pretty_date</th></tr>
<tr><td>Microsoft Stock Price:
<td align=right> \$$msft_stock_price
<tr><td>Bill Gates's Wealth:
<td align=right> \$$gates_wealth_billions billion
<tr><td>U.S. Population:
<td align=right> $pretty_population
<tr><td><font size=+1><b>Your Personal Contribution:</b></font>
<td align=right> <font size=+1><b>\$$personal_share</font></b>
</table>
<p>
<blockquote>
\"If you want to know what God thinks about money, just look at the
people He gives it to.\" <br> -- Old Irish Saying
</blockquote>
</center>
<hr>
<a href=\"http://www.photo.net/philg/\"><address>philg@mit.edu</address>
</a>
</body>
</html>
"
So is this the real code that sits behind http://philip.greenspun.com/WealthClock?
Actually, no.
Why the differences? I was concerned that, if it became popular, the Wealth Clock might impose an unreasonable load on the subsidiary sites. It seemed like bad netiquette for me to write a program that would hammer the Census Bureau and Security APL several times a second for the same data. It also seemed to me that users shouldn't have to wait for the two subsidiary pages to be fetched if they didn't need up-to-the-minute data.
Ten lines of Tcl suffices to create a general purpose caching facility that can cache the results of any Tcl function call as a Tcl global variable. This means that the result is stored in the AOLserver's virtual memory space and can be accessed much faster even than a static file. Users who want a real-time answer can demand one with an extra mouse click. The calculation performed for them then updates the cache for casual users.
Does this sound like overengineering? It didn't seem that way when Netscape, then makers of the world's most popular Web browser, put the Wealth Clock on their What's New page for two weeks (summer 1996). The URL was getting two hits per second. Per second. And all of those users got an instant response. The extra load on the Web server was not noticeable. Meanwhile, all the other sites on Netscape's list were unusably slow. Popularity had killed them.
Here are the lessons from this example:
ns_httpget
call.
The idea is that someone will come to the site, look for the name of the author, then click down to find the presentation of interest.
Here's the ADP source code:
<% wimpy_header "Choose Author" %> <h2>Choose an Author</h2> in <a href="/"><%=[wimpy_system_name]%></a> <hr> Here's a list of users who have public presentations: <ul> <% set db [ns_db gethandle] set selection [ns_db select $db "select distinct u.user_id, u.last_name, u.first_names, u.email from wimpy_users u, wimpy_presentation_ownership wpo, wimpy_presentations wp where u.user_id = wpo.user_id and wpo.presentation_id = wp.presentation_id and wp.public_p = 't' order by upper(u.last_name), upper(u.first_names)"] while { [ns_db getrow $db $selection] } { set_variables_after_query ns_puts "<li><a href=\"user-top.adp?user_id=$user_id\">$last_name, $first_names ($email)</a>\n" } %> </ul> Or you can do a full-text search through all the slides: <form method=GET action="search.adp"> Query String: <input type=text name=query_string size=50> <input type=submit value="Submit"> </form> <% wimpy_footer %>Note that one is allowed to use arbitrary HTML, including string quotes, at the top level of the file. Note further that there are two escapes to the ADP evaluator. The basic escape is
<%
, which
will execute a bunch of Tcl code for effect. If the Tcl code wants to
write some bytes to the browser, it has to call ns_puts
.
The second escape sequence is <%=
, which will execute a
bunch of Tcl code and then write the result out to the browser.
Generally one uses the <%=
style for simple things,
e.g., including the system name that is returned from the Tcl procedure
wimpy_system_name
. One uses the <%
style
to execute a sequence of Tcl procedures to query the database, etc.
A nice collection of ASP examples at http://philip.greenspun.com/books/panda/aspharvest/ was harvested in just a couple of hours of surfing one night in July 1998. It is a bit interesting that this surfing was done some time after the bug had become common knowledge yet companies such as DIGITAL, Arthur Andersen, and banks had not patched their servers. What is even more interesting is that by July 2003 nearly all of those companies have gone bankrupt or been absorbed.
firewall.asp is amusing because it is DIGITAL's advertisement for their network security products. Similarly GAP Instrument Corp. took the trouble to warn users
You have reached a computer system providing United States government information. Unauthorized access is prohibited by Public Law 99-474, (The Computer Fraud and Abuse Act of 1986) and can result in administrative, disciplinary or criminal proceedings.yet had left their ASP pages wide open.
CompuServe gives us a nice simple example with Conf.asp. The goal of the
script is to first figure out whether the person browsing is a
CompuServe member or not and then serve one of two entirely separate
HTML pages. An if statement is thus opened inside one <%
%>
and closed in another:
An interesting thing to note about this page is that CompuServe hasn't run their HTML through a syntax checker, which would no doubt have complained about the stuff after the<!--#INCLUDE VIRTUAL="/Forums/member.inc"--> <% if member = 1 then %> <HTML> <HEAD> <TITLE>TW Crime Forum</TITLE> </HEAD> <BODY BGCOLOR=#FFFFFF> ... ** a page for members *** .. </BODY> </HTML> <BR><I>We Update the Forum Directory Weekly. The directory was last updated: Thursday, January 08, 1998</I> ... </BODY> </HTML> <% else %> <HTML> <HEAD> <TITLE>TW Crime Forum</TITLE> </HEAD> <BODY BGCOLOR=#FFFFFF> ... ** a page for non-members *** </BODY> </HTML> <%End If%>
</HTML>
(I've
highlighted the extraneous text in bold, above).
Let's move on to some db-backed pages.
The folks who built Fulton Bank's site are very enthusiastic about Microsoft:
"The hottest technology to hit the Internet which is actually useable now is Active Server Page scripting. This has given us a number of advantages over the ancient art of CGI. ... Intranets and Extranets where the variety of user machine platforms, processors, etc are an issue ASP can play in nicely."Let's see how ASP works for them in process_product.asp, a script that takes a query string and tries to find banking products that match this query string.
-- Xspot.com (once apparently a thriving Web development concern, now apparently bankrupt)
This is some pretty clean code. The programmers have encapsulated the database password in their ODBC connection configuration. Also, rather than just bury the magic number "1057" in the code, they set<% affcode = 1057 %> <HTML> <HEAD> <TITLE>Fulton Bank</TITLE> </HEAD> <BODY BGCOLOR="#FFFFFF"> <BLOCKQUOTE> <TABLE WIDTH=370 ALIGN="middle"> <TR> <TD> <BR> <IMG SRC="images/header_products.gif"><BR> <BR> <BR> <% Set Conn=Server.CreateObject("ADODB.Connection") Conn.Open "FultonAffiliates" SQL = "SELECT * FROM products WHERE productname LIKE '%" & Request.Form("product") & "%' AND affiliate = '" & affcode & "'" Set RS = Conn.Execute(SQL) %> <TABLE> <% if RS.EOF then %> <TR><TD>Sorry No Products Found</TD></TR> <% end if %> <% DO UNTIL RS.EOF %> <TR> <TD VALIGN="top"><IMG SRC="images/diamond3.gif"></TD> <TD> <A HREF="<% = RS("url") %>"><FONT COLOR="blue"><% = RS("productname") %></FONT></A><BR> <% = RS("shortdesc") %><BR> <BR> <BR> </TD> </TR> <% RS.MoveNext %> <% LOOP %> </TABLE></BLOCKQUOTE> </TD> </TR> </TABLE> <% rs.close conn.close %> <!--#include file="footer.asp"--> </BODY> </HTML>
affcode
to it as the very first line of the program. Finally, they've parked
the page footer in a centralized footer.asp file that gets included by
all of their scripts.
Note that the final example has a major security flaw - it incorporates strings from the users request directly into the text of a sql query. This is subject to 'SQL Injection' - carefully crafted sql could alter the semantics of the query to return more information than intended by the site authors. Real DB applications will use parameterized sql these days.
-- Lee Schumacher, March 9, 2005