Massachusetts Institute of Technology
Department Electrical Engineering and Computer Science

6.916: Software Engineering of Innovative Web Services
Basics of Tcl and SQL

Readings:

Online bboard for assistance: Basics of Tcl and SQL Pset Q&A forum

Objectives: By doing this problem set you will learn

This problem set requires you to learn a lot of new software, so make sure you get started early: plan to spend at least two or three sessions on it. There is nothing difficult here, but you need to work through the initial mechanics of using Tcl, SQL, and running the Web server; and this takes time.

Please feel free to use the on-line forum to ask your questions about this problem set or any other class-related problem. You can use the forum to view other people's questions, and provide or view answers. Also, we strongly recommend that you do as much of your beginning work as possible in the supervised labs, where the Lab Assistants on hand to help you.

Finding your Web server

Check to see that your Web server is running by visiting the address for your server -- for example http://lcsweb1.lcs.mit.edu if you are in the course at MIT -- from a Web browser running on your local client machine (not on your Web server machine). If the server is running, you'll see a message with the name of your virtual server, followed by "test page".

Read Using the LCS Web/db Computing Facility and follow the instructions to log on to your Web server machine.

Getting Started with Tcl

Exercise 1: Running Tcl from the shell

Run Emacs on your server machine. Type "m-x shell" to get a Unix shell. Type "tclsh" to start the Tcl shell program. Try a few simple Tcl commands. Also, type "info tclversion" at the tclsh prompt to make sure that you're running Tcl 8.3, the same version that is compiled into AOLserver.

Now define a recursive Fibonacci procedure in Tcl. Execute and test.

Hint: If you're writing Tcl programs of more than two or three lines, you'll find more convenient to type the code into a separate Emacs buffer (set the buffer to to tcl mode using m-x tcl-mode) and cut and paste from there into the Tcl shell buffer. Actually, you'll almost never be running Tcl from the shell in this course, but you will be writing lots of Tcl in Emacs as you create Web pages.

Exercise 2: Running Tcl from an (almost) HTML page

Look at two-plus-two.adp (source). This is an example of the ADP (AOLserver Dynamic Page) templating facility in AOLserver. You can view the source from your web browser, but you should also read it into an emacs buffer, since you will need to edit it. The source code should be in the file /web/yourvirtualserver/www/psets/basics/two-plus-two.adp. If these files are missing from your server machine, download them from ps-basics.tar and put them in /web/yourvirtualserver/www.

Augment the page so that (1) you add a $4000 South American Cichlid aquarium as an option, (2) you use a constructor procedure to build each aquarium element (instead of simply calling list), (3) you add an element to the aquarium for how many of each type of aquarium will be installed (4) you use procedures to extract type, cost and quantity from an aquarium element (instead of simply calling lindex), (5) you print out quantity-dependent subtotals and the grand total at the bottom.

Hint: If the code you write doesn't produce what you expect when you load the page (or, as is common, produces nothing at all), this is likely to be a Tcl error. To see if there was an error, look at the error log for your server, which is in the file /home/aol/log/yourvirtualserver-error.log.

Exercise 3: Simple Tcl pages

Using the Web browser running on your local machine, visit the URL http://yourserveraddress/psets/basics/simple-tcl-page.tcl. Using Emacs running on the server machine, examine the source code for this page in /web/yourvirtualserver/www/psets/basics/simple-tcl-page.tcl. Also look at the source code for the target of the form, which is in /web/yourvirtualserver/psets/basics/tcl/simple-tcl-page-2.tcl. If these files are missing from your server machine, download them from ps-basics.tar and put them in /web/yourvirtualserver/www.

Notice how we use Tcl to read the form variables. Try out the form a couple of times, using your browser. Now debug the regular expression in simple-tcl-page-2.tcl so that it properly handles the names "Tammy Faye Baker" and "William H. Gates III".

Hint 1: It is easier if you don't try to do this in a single regexp. Use if then elseif then elseif ...

Hint 2: regexp has a side-effect. If you use a multi-clause if statement, make sure that you wrap your calls to regexp in braces so that they don't all get evaluated immediately.

Hint 3: Keep in mind that a match succeeds if the pattern matches a substring of the data. If you want to force your pattern to match the entire data item, you'll have to use an appropriate regexp to ensure this.

Exercise 4: Tcl pages that query foreign servers

Using the Web browser running on your local machine, visit the URL http://www.webho.com/WealthClock. Read the discussion of this program in Chapter 10 of Philip and Alex's Guide to Web Publishing. Drawing upon that program as a model, build a new web service that takes the ISBN of a book from a form and then uses ns_httpget to query several online bookstores to find price and stock information and displays the results in an HTML table. Save your program in files called /web/yourvirtualserver/www/psets/basics/books.tcl and books-2.tcl so people can access your service over the web.

We suggest querying barnesandnoble.com and www.1bookstreet.com. Your program should be robust to timeouts, errors at the foreign sites, and network problems. You can ensure this by wrapping a Tcl catch statement around your call to ns_httpget. Test your program with the following ISBNs: 0385494238, 0062514792, 0140260404, 0679762906.

Try adding more bookstores, but you may need to do kludges to make them work. For example, amazon.com and wordsworth.com tend to respond with a 302 redirect if the client doesn't give them a session ID in the query.

Extra credit: From which of the preceding books is the following quote taken?

"The obvious mathematical breakthrough would be development of an easy way to factor large prime numbers."

This would be a good time to take break.

Getting started with SQL*Plus

Start up again with Emacs (you took a break, right?) and start a Tcl shell as before ("M-x shell" then "tclsh"). Type "M-x rename-buffer" to rename the shell to "tcl-shell". Type "M-x shell" to then get a new Unix shell. Rename this buffer "sql-shell". In the SQL shell, type "sqlplus" to start SQL*Plus, the Oracle shell client. It's convenient to work like this using two shells, one for Tcl and one for SQL.

Exercise 5: Talking to Oracle from the shell

Type the following at SQL*Plus to a table for keeping track of the classes you're taking this semester:
create table my_courses (
	course_number	varchar(20)
);
Note that you have to end your SQL commands with a semicolon in SQL*Plus. These are not part of the SQL language and you shouldn't use these when writing SQL in your Tcl programs for AOLserver.

Insert a few rows, e.g.,

insert into my_courses (course_number) values ('6.916');
Commit your changes:
commit;
See what you've got:
select * from my_courses;
One of the main benefits of using an RDBMS is persistence. Everything that you create stays around even after you log out. Normally, that's a good thing, but in this case you probably want to clean up after your experiment:
drop table my_courses;
Quit SQL*Plus with the quit command.

Exercise 6: Tcl pages that talk to Oracle

Look at the file /web/yourvirtualserver/www/psets/basics/tcl/quotations.tcl, which is the source code for a page that displays quotations that have been stored in the Oracle database. Visit this page with your Web browser and you should get an error. The reason for the error is that the program is calling a procedure that doesn't exist: ad_header ("ArsDigita Header"). You can confirm this suspicion by using Emacs to read /home/aol/log/yourvirtualserver-error.log, which is where AOLserver logs any notices or problems.

To get AOLserver to load procedure definitions at server startup, you have to put .tcl files in your server's private Tcl library: /web/yourvirtualserver/tcl/. Create a file called "basics-defs.tcl" in this directory and define the following Tcl procedures:

Reload the quotations.tcl page and you get ... the same error! AOLserver doesn't know that you've added a file to the private library; this is only checked at server startup. Go to a Unix shell and "restart-aolserver yourservername" (this is the big hammer; it kills your server's Unix process so that Unix will restart AOLserver automatically). If restart-aolserver does not come back with "Killing 10234" or some other process ID, you'll know that you did not succeed (perhaps you made a typo when specifying your server name).

Reload the quotations.tcl page and you get ... a slightly different error! The program is trying to query a table that doesn't exist: quotations.

Go back to your sql shell and restart SQL*Plus. Copy the table definition from the comments at the top of the file quotations.tcl and feed this definition to Oracle. Go back to your Web browser and reload the page that previously gave you an error. Things should now work, although the quotations table is empty.

Use the form on the web page to manually add the following quotation, under an appropriate category of your choice: "640K ought to be enough for anybody" (Bill Gates). Note that it would be funnier if our table had a column for recording the date of the quotation (1981) but we purposely kept our data model as simple as possible.

Return to SQL*Plus and select * from the table to see that your quotation has been inserted into the table. The horrible formatting is an artifact of your having declared the quote column to be 4000 characters long.

In SQL*Plus, insert a quotation with some hand-coded SQL. To see the form of the SQL insert command you should use, examine the code on the page quotation_add.tcl. After creating this new table row, do select * again, and you should now see two rows.

Hint: Don't forget that SQL quotes strings using single quote, not double quote.

Now reload the quotations.tcl URL from your Web browser. If you don't see your new quotation here, that's because you didn't type COMMIT; at SQL*Plus. This is one of the big features of a relational database management system: simultaneously connected users are protected from seeing each other's unfinished transactions.

Preloading tables from data files

Now it is time to preload your quotations database with some interesting material, using Oracle's SQL*Loader utility. With Emacs, examine the file
/web/yourvirtualserver/www/psets/basics/tcl/quotation-list.ctl
This is a SQL*Loader control file. You'll look at SQL*Loader more in Oraexercise 1 (below). For now, just invoke SQL*Loader from the Unix shell with the sqlldr command:
sqlldr userid=your-id control=quotation-list.ctl
Here your-id should be your database username followed by a slash, followed by your database password. Reload the quotations.tcl web page, and verify that the new quotations are in the data base. If you check you directory, you should see that SQL*Loader also generated a quotation-list.log file.

Working with AOLserver and Oracle

Study the source code for the pages quotations.tcl and quotation-add.tcl, paying particular attention to how they access the database. Notice how they represent SQL commands as strings in Tcl, which are then used together with appropriate Tcl commands from the database API. You can find some documentation on the API in Brief introduction to database access using AOLServer and the ACS, but the comments in the code files should be reasonably self-explanatory.

Exercise 7: Improving the User Interface for data entry

Go back to the main quotations.tcl page and modify it so that categories entry is done via a select box of existing categories (you will want to use the "SELECT DISTINCT" SQL command). For new categories, provide an alternative text entry box labeled "new category". Make sure to modify quotation-add.tcl so that it recognizes when a new category is being defined.

Hint: In order to simplify debugging, you may find it useful to test your query using SQL*PLUS in the shell, before integrating the query into a Tcl page.

Exercise 8: Searching

Add a small form at the top of quotations.tcl that takes a single query word from the user. Build a target for this form that returns all quotes containing the specified word. Your search should be case-insensitive and also look through the authors column.

Hint 1: Read about simple queries in SQL for Web nerds.

Hint 2: like '%foo%'

Hint 3: SQL's UPPER and LOWER functions

Personalizing Web services with cookies

We'd like you to build a system that implements per-user personalization of the quotation database. The overall goal should be

You can personalize Web services with the aid of magic cookies. A cookie issued by the server directs the browser to store data in browser's computer. To issue a cookie, the server includes a line like

Set-Cookie:  cookie_name=value; path=/ ; expires=Fri, 01-Jan-2010 01:00:00 GMT
in the HTTP header sent to the browser. Here cookie_name is the name for this cookie, and value is the associated value, which can contain any character or format except for semicolon, which terminates a cookie. The path specifies which URLs on the server the cookie applies to. Designating a path of slash (/) includes all URLs on the server.

After the browser has accepted a server's cookie, it will include the cookie name and value as part of its HTTP requests whenever it asks that server for an applicable URL. Your Tcl programs can read this information using the AOLServer API

[ns_set get [ns_conn headers] Cookie]
After the expiration date, the browser no longer sends the cookie information. The server can also issue cookies with no specified expiration date, in which case, the cookie is not persistent -- the browser uses it only for that one session.

You can see an example of how cookies are issued and read, by visiting the URL http://yourvirtualserver/psets/basics/tcl/set-cookies.tcl and examining the Tcl for file and the associated URLs check-cookies.tcl and expire-cookies.tcl. Observe how expire-cookies gets rid of cookies by reissuing them with an expiration date that has already past.

Reference: The magic cookie spec is available from http://home.netscape.com/newsref/std/cookie_spec.html.

Exercise 9

Implement the personalized quotation system described above.

Hint 1: It is possible to build this system using an ID cookie for the browser and keeping the set of killed quotations in Oracle. However, if you're not going to allow users to log in and claim their profile, there really isn't much point in keeping data on the server. In fact, by keeping killed quotation IDs in your users' browser cookies, you've achieved the holy grail of academic database management system researchers: a distributed database!

Hint 2: It isn't strictly copacetic with the cookie spec, but you can have a cookie value containing spaces. Tcl stores a list of integers internally as those numbers separated by spaces. So the easiest and simplest way to store the killed quotations is as a space-separated list.

Hint 3: Don't filter the quotations in Tcl. It is generally a sign of incompetent programming when you query more data from Oracle than you're going to display to the end-user. SQL is a very powerful query language. You can use the NOT IN feature to exclude a list of quotations.

How about taking another break?

Sharing data with XML

As you learned above from querying bookstores, data on the Web have not traditionally been formatted for convenient use by computer programs. The bookstore program you wrote could easily break if Barnes and Noble changed the formatting of its web page to indicate the book price in some other way.

In theory, people who wish to exchange structured data over the Web can cooperate using XML (eXtensible Markup Language), a 1998 standard from the Web Consortium (www.w3.org/XML/). XML has started to become widely hyped over the past year as "the next big thing on the Web", but in practice, hardly anybody uses XML yet (summer 2000). Fortunately for you in completing this problem set, you can cooperate with your fellow students: the overall goal is to make quotations in your database exportable in a structured format so that other students' applications can read them.

In order to cooperate, we need (1) an agreed-upon URL at everyone's server where the quotations database may be obtained; and (2) an agreed-upon format for the quotations. In point of fact, we could avoid the need for prior agreement by setting up infrastructures for service discovery and by employing techniques for self-describing data -- both of which we'll deal with later in the semester -- but we'll keep things simple for now.

We'll format our quotations using XML. XML structures consist of data strings enclosed in HTML-like tags of the form <foo> and </foo>, describing what kind of thing the data is supposed to be.

Here's an informal example, showing the structure we'll use for our quotations:

<quotations>
  <onequote>
    <quotation_id>1</quotation_id>
    <insertion_date>1999-02-04</insertion_date>
    <author_name>Bill Gates</author_name>
    <category>Computer Industry Punditry</category>
    <quote>640K ought to be enough for anybody.</quote>
  </onequote>
  <onequote>
  .. another row from the quotations table ...
  </onequote>
  ... some more rows
</quotations>
Notice that there's a separate tag for each column in our SQL table:
<quotation_id>
<insertion_date>
<author_name>
<category>
<quote>
There's also a "wrapper" tag that identifies each row as a <onequote> structure, and an outer wrapper that identifies a sequence of <onequote> structures as a <quotations> document.

Building a DTD

We can give a formal description of our XML structure, rather than an informal example, by means of an XML Document Type Definition (DTD).

Our DTD will start with a definition of the quotations tag:

<!ELEMENT quotations (onequote)+>
This says that the quotations element must contain at least one occurrence of onequote but may contain more than one. Now we have to say what constitutes a legal onequote element:
<!ELEMENT onequote (quotation_id,insertion_date,author_name,category,quote)>
This says that the sub-elements, such as quotation_id must each appear exactly once and in the specified order. Now we have to define an XML element that actually contains something other than other XML elements:
<!ELEMENT quotation_id (#PCDATA)>
This says that whatever falls between <quotation_id> and </quotation_id> is to be interpreted as raw characters rather than as containing further tags (PCDATA stands for "parsed character data").

Here's our complete DTD:

<!-- quotations.dtd -->
<!ELEMENT quotations (onequote)+>

<!ELEMENT onequote (quotation_id,insertion_date,author_name,category,quote)>

<!ELEMENT quotation_id (#PCDATA)>
<!ELEMENT insertion_date (#PCDATA)>
<!ELEMENT author_name (#PCDATA)>
<!ELEMENT category (#PCDATA)>
<!ELEMENT quote (#PCDATA)>
The point of building a DTD is not just to satisfy some anal, formalistic craving of Web protocol designers. Rather, the DTD provides a machine-readable description of the XML data structure that can be fed to parsers that can then automatically tokenize the XML document. (This is a simple example of a self-describing data technique.)

In this problem set, however, we won't use the DTD at all. You'll need to use only the informal XML quotations example.

Exercise 10: Generating XML

Create a Tcl program that queries the quotations table, produces an XML document in the preceding form, and returns it to the client with a MIME type of "application/xml". Place this in a file quotations-xml.tcl, so that other users can retrieve the data by visiting that agreed upon URL.

To get you started, we've provided the file example-xml.tcl. Requesting this URL with a Web browser should offer to let you to save the document to a local file, and you can then examine it with a text editor on your local machine. (Alternatively, your browser may already have some special behavior defined for MIME type application/xml. IE 5.5, for example, will automatically display the XML document.) The differences between our example and your program is that you'll need to produce a document containing the entire table and you'll need to generate it on the fly.

Exercise 11: Importing XML

Write a program to import the quotations from another student's XML output page (if you have completed the previous exercise and your peers have not, this might be a good time to exhort them to greater efforts). Your program must

Hints: You might want to set up a temporary table using create table quotations_temp as select * from quotations and then drop it after you're done debugging, so that you don't mess up your own quotations database.

Rather than having you work with DTDs and general XML parsers, we've gone for simplicity here by predefining for you a parser in Tcl that understands only this particular quotations XML structure. The procedure is parse_all. You have to install this file in your server's private Tcl library, /web/yourvirtualserver/tcl/, for this function to be callable by .tcl and .adp pages. (Don't forget to restart your Web server.) The parse_all procedure takes an XML quotation structure as argument and returns a Tcl list, showing the parts and subparts of the structure. To see an example of the format, use your browser to visit the page http://yourvirtualserver/psets/basics/tcl/xml-parse-test.tcl.

Note: these exercises are designed to familiarize you with XML. In most cases, XML processing should be done using Oracle's Java libraries. See http://www.arsdigita.com/doc/xml.html.

The Wide World of Oracle

We're now going to shift gears into a portion of the problem set designed to teach you more about Oracle and SQL.

Oraexercise 1: SQL*Loader

Oraexercise 2: Copying data from one table to another

This exercise exists because we found that, when faced with the task of moving data from one table to another, people were dragging the data from Oracle into AOLserver, manipulating it in Tcl, then pushing it back into Oracle. This is not the way! SQL is a very powerful language and there is no need to bring in any other tools if what you want to do is move data around within Oracle.

Oraexercise 3: JOIN

With a single SQL statement JOINing my_stocks and stock_prices, produce a report showing symbol, number of shares, price per share, and current value.

Oraexercise 4: OUTER JOIN

Insert a row into my_stocks. Run your query from Oraexercise 3. Notice that your new stock does not appear in the report. This is because you've JOINed them with the constraint that the symbol appear in both tables.

Modify your statement to use an OUTER JOIN instead so that you'll get a complete report of all your stocks, but won't get price information if none is available.

Mini-Project

Here's a little project that combines some of the mechanics you've mastered in this problem set.

Choose a web page that provides some numerical data that is updated reasonably frequently, e.g., a couple of times a day. Two examples are:

Write a Tcl procedure that gets this data by scraping it from the appropriate page, as you did earlier with book prices, and stuffs it into a SQL table that collects the data values and the times of access. Use the AOLserver API call ns_schedule_proc to schedule your procedure to run once an hour. Build a .tcl page that people can access to see how data change over time.

One of the things you might do is to keep track of when the server you are accessing is up. For example, one of the interesting things about Amazon is that they often lose control of their server farm and database. (They write a lot of C code and one programmer's sloppiness can generate a catastrophic failure of the entire service.) You might want to build your system so that you can record (a) times when the server is unreachable, and (b) timed for which page served some error message. For example, Amazon will send "Our store is closed temporarily for scheduled maintenance". (You'll sometimes get this during the middle of weekdays when they would definitely not have intentionally scheduled any maintenance.)

You can also display your data graphically as a chart. You can do this in a way that is reasonably browser independent, by using single-pixel GIFs and WIDTH and HEIGHT tags now. Grab the software in http://software.arsdigita.com/tcl/ad-graphing.tcl and put it in your server's private Tcl directory (/web/yourservername/tcl/). Read the docs at http://software.arsdigita.com/www/doc/graphing.html and then write code to generate a pretty chart of your data.

You might also provide a way to return your data as an XML structure, so that people can access it in building their own programs, without going through the same page-scraping shenanigans that you had to in collecting it. The idea that the Web could contain lots of data sources designed for programmatic access, so that people can combine them to build more elaborate programs, hints at the vision of Web services, about which we will have more to say anon.

Place your project on your web server on a page called project.tcl in your problem set directory, so that others can access it.

Turning in your work

We expect you to have your code working and debugged before the beginning of the class when it is due. In class, we will use a web browser to connect to several student servers and all look together at the pages, to see how well they work. Also, starting immediately after class, we will examine the contents of your Web server to look at your answers. On Thursday night, you will be required to do a code review with a TA in lab.

Who Wrote This and When

This problem set was written by Philip Greenspun and Hal Abelson in January 1999 for MIT Course 6.916. It was revised in January 2000 for AOLserver 3.0, which incorporates Tcl 8.3, and revised again in August 2000, with help from Dan Parker, Andrew Grumet, and Ravi Jasuja. The old version is available from http://philip.greenspun.com/teaching/psets/basics/tcl/6916.ps1-for-aolserver-2.3.tar.

This file is permanently housed at http://philip.greenspun.com/teaching/psets/tcl-version/basics/basics.adp.

This material is copyright 1999, 2000, by Philip Greenspun and Hal Abelson. It may be copied, reused, and modified, provided credit is given to the original authors with a hyperlink to this document.


Last Modified: August 30 2000, 2:20AM.
Maintainer: teadams@mit.edu