Massachusetts Institute of Technology
Department Electrical Engineering and Computer Science

6.916: Software Engineering of Innovative Web Services
Problem set 4

This is an obsolete version of problem set 4. The current version is available at http://philip.greenspun.com/teaching/psets/ps4/ps4.adp.
Reading for this week:

Online assistance: 6.916 Q&A forum

Objectives

Teach students the virtues of metadata. More specifically, they learn how to formally represent the requirements of a Web service and then build a computer program to generate the computer programs that implement that service.

The Huge Picture

Organizations have complex requirements for their information systems. They also have insane schedules and demand that you build their system within weeks. Finally, they are fickle and have no compunction about changing the requirements mid-stream.

The Big Picture

Corporations all have knowledge management systems even though generally they may not have any knowledge. Universities claim to have knowledge and yet none have knowledge management systems. You are going to build a knowledge management system for your university. In order to ensure that you can get a job after this course is over, you must refer to your final product as "a KM system".

Another issue is a perennial side-show in the world-wide computer programming circus: the spectacle of nerds arguing over programming tools. The data model can't represent the information that the users need, the application doesn't do what what the users need it to do, and instead of writing code, the "engineers" are arguing about Java versus Lisp versus Perl versus Tcl. If you want to know why computer programmers get paid less than medical doctors, consider the situation of two trauma surgeons arriving at an accident scene. The patient is bleeding profusely. If surgeons were like programmers, they'd leave the patient to bleed out in order to have a really satisfying argument over the merits of two different kinds of tourniquet.

If you're programming one Web page at a time, you can switch to the Language du Jour in search of higher productivity. But you won't achieve significant gains unless you switch from writing code for one page. You need to think about ways to write down a formal description of the application and user experience, then let the computer generate the application automatically.

The Medium-Sized Picture

Knowledge is text, authored by a user of the community. The user may attach a document, photograph, or spreadsheet to this text. Other users can comment on the knowledge, submitting text and optional attachments of their own. What distinguishes a knowledge management system from the standard /bboard module of the ACS? The following:

A data model to represent your data model

Business people like to talk about "objects" rather than "tables". This doesn't mean that the MBA curriculum includes inheritance and method combination in the Common Lisp Object System. What seems to have happened is that
  1. Xerox PARC and MIT developed object-oriented programming systems in the 1970s, including Smalltalk and the Lisp Machine
  2. People hailed these systems as rather advanced
  3. The business community finally picked up the object-oriented buzz during the 1990s
When a business person talks about an "object", what he or she generally means is "a row in a relational database table".

In order to make your system comprehensible to the CIO/CTO types who will be adopting it, we'll use their vocabulary. A table row is an "object" and each column is an "element" of that object.

We need a way to represent the kinds of objects that our system will represent first. Let's assume that we'll have at least the following object types:

To say that "Joe Squigglesworth is a really boring lecturer in Classics 101 but Jane Bartlett is excellent", the user would create three objects: two of type person and one of type class. The content of Classics 101 would be described as one of the elements of the class object and the comments about the professors' performance could be commentts on the professor objects or on the class object.

For each object type we'll be creating an Oracle table. For each Oracle table we create, we store one row in the metadata table:


create table km_metadata_objects (
        table_name              varchar(21) primary key,
	-- use this to build the mapping table
	really_short_name	varchar(10),
        pretty_name             varchar(100) not null,
        pretty_plural           varchar(100)
);
We need to store information about what elements will be kept for each type of object. Note that some elements are common across object types: You won't be writing code to implement fancy permissioning so not all of this information will be useful in this problem set. However, it is good to have it in your data model even if you aren't going to build .tcl pages to update or query it. Notice that you may need to define extra tables to support some of this many-to-one information, e.g., the access control list for an object.some of these elements can't be modeled in database column.

For elements that are unique to an object type, we need to represent one row in a meta data table per element. Note that we also use this table to represent links from object-type to object-type (abstract_data_type of "mapping"):


create table km_metadata_elements (
        metadata_id             integer primary key,
        table_name              not null references km_metadata_objects,
        column_name             varchar(30) not null,
        pretty_name             varchar(100) not null,
        abstract_data_type      varchar(30) not null, 	-- ie. "text" or "shorttext" "boolean" "mapping" "user" 
	-- this one is not null except when abstract_data_type is "mapping" or "user"
        oracle_data_type        varchar(30),   -- "varchar(4000)"
        -- e.g., "not null" or "check foobar in ('christof', 'patrick')"
        extra_sql               varchar(4000),
        -- values are 'text', 'textarea', 'select', 'radio', 
	-- 'selectmultiple', 'checkbox', 'checkboxmultiple', 'selectsql'
        presentation_type       varchar(100) not null,
        -- e.g., for textarea, this would be "rows=6 cols=60", for select, Tcl list,
        -- for selectsql, an SQL query that returns N district values
        -- for email addresses mailto:
        presentation_options    varchar(4000),
        -- pretty_name is going to be the short prompt, 
	-- e.g., for an update page, but we also need something
	-- longer if we have to walk the user through a long form
        entry_explanation       varchar(4000),
	-- if they click for yet more help 
        help_text               varchar(4000),
        -- note that this does NOT translate into a "not null" constraint in Oracle
        -- if we did this, it would prevent users from creating rows incrementaly
        mandatory_p             char(1) check (mandatory_p in ('t','f')),
        -- ordering in Oracle table creation, 0 would be on top, 1 underneath, etc.
        sort_key                integer,
        -- ordering within a form, lower number = higher on page 
        form_sort_key           integer,
        -- if there are N forms, starting with 0, to define this object, 
	-- on which does this go?  (relevant for very complex objects where
	-- you need more than one page to submit)
        form_number             integer,
        -- for full text index
        include_in_ctx_index_p  char(1) check (include_in_ctx_index_p in ('t','f')),
        -- add forms should be prefilled with the default value
        default_value           varchar(200),
        -- if the abstract_data_type is mapping, the table to which 
	-- we're mapping
        map_to_which_table_name  references km_metadata_objects,
	check ((abstract_data_type not in ('mapping','user') and oracle_data_type is not null)
                or
              (abstract_data_type in ('mapping','user'))),
        unique(table_name,column_name)
);
Does it still seem odd that a mapping between objects, which will be represented in a separate SQL table, should be present as a row in km_metadata_elements? Keep in mind that a major function of this table is to specify the user interface. At the very least, the mapping needs to be represented here so that your programs will know where on an input form to solicit a mapped object.

Exercise 1: Use prototype builder to construct admin pages for your metadata

Create a directory /admin/km/ under your Web server page root. Go to a Unix shell and type

> cd /web/yourservername/www/admin/
> chmod a+w km
So that the /admin/km/ directory will be writable by the Web server. Now use the prototype builder (documented at http://photo.net/doc/prototype.html; available on your own server at /admin/prototype/) to generate admin pages for the km_metadata_elements and km_metadata_objects tables.

Exercise 2: Fill your metadata tables with info

Fill your meta data tables with some info. Add one entry to the objects table for each of the object types listed above. For each object type, fill in some elements. Remember that each object will have the default fields specified above so you don't need things like name or overview. Here are some examples:

for the person type
date_of_birth, title
for the class type
prerequisites, professors (link to objects of type person)
for the document type
links to every possible other kind of object
for the technology type
manufacturer, model number
for the research_sponsors type
program officers (links to objects of type person)

Exercise 3: Write a program to generate DDL statements

Write a script called /admin/km/generate-ddl.tcl that will generate CREATE TABLE statements from the meta data tables. It can simply output this to the Web browser with a MIME type of text/plain (then you can save it to your local file system as km-generated.sql) or write it back into the Unix file system as /admin/km/km-generated.sql.

Each object table should have an object_id column. Use a single Oracle sequence to generate keys for this column.

Remember that for every unique pairing of tables through an element of type "mapping", you'll have to create a mapping table to represent the many-to-many relation. Call these "km_t1rsn_t2rsn_map" where "t1rsn" is the "really short name" of table 1. For simplicity, assume that associations are bidirectional, e.g., if you associate the class English 101 with the English deparment the English department is associated with the class. You'll want to build a primary key constraint into the map table definition but also, for query efficiency, create a concatenated index on the columns in the other order. (The trees chapter of SQL for Web Nerds, at http://photo.net/sql/trees.html, gives some examples of concatenated indices. Also read the composite indices section of the Oracle Tuning manual: http://philip.greenspun.com/sql/ref/composite_indices. See also the Oracle SQL Reference section at http://philip.greenspun.com/sql/ref/create_index.)

Another good idea for the mapping table is to include a map_comment column where you can store a user-entered reason for relating two objects. Here is an example table defintion:

create table km_person_class_map (
	person_id	not null references persons,
	class_id 	not null references classes,
	map_comment	varchar(4000),
	creation_user	not null references users,
	creation_date	date not null default sysdate,
	primary key(person_id, class_id)
);
In this particular case, you could use the map_comment column to distinguish between a professor being associated with a class and a student.

Exercise 4: Write a program to generate a "drop all tables" script

Write a script called /admin/km/generate-drop-tables.tcl that will generate DROP TABLE statements from the meta data tables. You probably won't get your data model right the first time so you might as well be ready to clear out Oracle and start over.

Exercise 5: Build the knowledge capture pages

Create a directory /km under the Web server page root. The index.tcl page should display an unordered list of object types and, next to each type, options to "browse" or "create". You don't have any information in the database, so you should build /km/object-create.tcl first. This page will query the metadata tables to build a data entry form to create a single object of a particular type. Build /km/object-create-2.tcl to process the results of this form. You may find util_prepare_update from /tcl/00-ad-utilities.tcl useful in building object-create-2.tcl.

When object-create-2.tcl is done inserting the row into the database, it should ns_returnredirect to object-display.tcl. This page should have small hyperlinks to edit single fields at a time (all linking to object-edit-element.tcl with different arguments). This page should show all the currently linked objects and have "add link" hyperlinks to object-add-link.tcl.

The page returned by object-add-link.tcl will look virtually identical to /km/index.tcl and will in fact link to the same URL: object-browse-one-type.tcl. When called with only table_name, this page will display a table of object names with dimensional controls at the top. The dimensions should be "mine|everyone's" and "creation date". The user ought to be able to click on a table header and sort by that column.

When called with extra arguments, object-browse-one-type.tcl will pass those arguments through to object-view-one.tcl and, if the user clicks a confirmation button, will eventually result in object-add-link-2.tcl being invoked. The extra arguments should be link_to_table_name and link_to_object_id.

Exercise 6: Gather statistics

You want to know when people are looking at and reusing knowledge. Create a table to hold object views:

-- we will be updating the reuse_p column of views so it 
-- will be easier to have a primary key 
create sequence km_object_view_id;

create table km_object_views (
	object_view_id	integer primary key,
	-- which user
	user_id		not null references users,
	-- two columns to specify which object 
	object_id	integer not null,
	table_name	varchar(21) not null,
	view_date	date not null,
	reuse_p		char(1) default 'f' check(reuse_p in ('t','f'))		
);
Modify object-view-one.tcl so that you explicitly close the TCP connection to the user (using ns_conn close). This will stop the Netscape icon to stop spinning but the AOLserver thread will remain alive so that you can log.

After the ns_conn close, insert a row into the km_object_views table iff there isn't already a log row for this user/object pair within 24 hours. You could do this with a

  1. open a transaction
  2. lock the table
  3. count the number of matching rows within the last 24 hours
  4. compare the result to 0 and insert if necessary
  5. close the transaction
However, you can also do this with a single ns_db dml statement. Here's an example of an INSERT statement that only has an effect if there isn't already a row in the table.
insert into msg_id_generator (last_msg_id)
select ('000000') from dual
where 0 = (select count(last_msg_id) from msg_id_generator);
Apply this example to the program of thread-safe logging if and only if there isn't an identical row logged within the last 24 hours.

Date/time arithmetic: see http://photo.net/sql/dates.html.

Exercise 7: Gather more statistics

Modify object-view-one.tcl to add a "I reused this knowledge" button. This should link to object-mark-reused.tcl, a page that updates the reuse_p flag of the most recent relevant row in km_object_views. The page should raise an error if it can't find a row to update.

Exercise 8: Explain the concurrency problem in Exercise 7

Explain the concurrency problem in Exercise 7 and talk about ways to address it.

Exercise 9: Do a little performance tuning

Create an Oracle index on km_object_views that will make the code in exercises 6 and 7 go fast.

Exercise 10: Display Statistics

Build /admin/km/statistics.tcl to show, by day, the number of objects viewed and reused. This report should be broken down by object type and all the statistics should be links to "drill-down" pages where the underlying data are exposed, e.g., which actual users viewed or reused knowledge and when.

Exercise 11: Build a site-wide index

Using the methods outlined in http://photo.net/doc/site-wide-search.html, build an index of the content in the KM system objects.

Create a file /admin/km/generate-sws-triggers.tcl to read the meta data tables and

  1. generate database triggers that will automatically put indexable object content into the site_wide_index table
  2. insert one row into sws_table_to_section_map for every object type defined in km_metadata_objects
Add some new objects and verify that they are being automatically copied by the triggers into the index table.

Exercise 12: Query pages for the site-wide index

Create a page /km/search.tcl that displays a form letting the user search for a phrase either through all objects or only in one type of object. For /km/search-2.tcl you'll either want to use Intermedia or the DBMS_LOB functions and the pseudo_contains source code from http://software.arsdigita.com/www/doc/sql/pl-sql.sql.

See http://philip.greenspun.com/sql/ref/dbms_lob and http://philip.greenspun.com/sql/ref/intermediatext as references.

Optional

Show your system to a business school professor.

Who Wrote This and When

This problem set was written by Philip Greenspun in October 1999 for MIT Course 6.916. It is copyright 1999 Philip Greenspun but may be reused provided credit is given to the original author with a hyperlink to this document.

It is permanently housed at http://philip.greenspun.com/teaching/psets/ps4/ps4.adp.


Maintainer: teadams@mit.edu