Site Home : Software : How to Write a Claim Construction Tutorial : One Example
Topics addressed in this tutorial include the following:
Information within an RDBMS is stored in tables. Communication with the RDBMS is via the Structured Query Language (SQL), which includes statements such as CREATE TABLE, INSERT (add a row), UPDATE (change a data item within a table), DELETE (remove a row), and SELECT (return a report). The "SQL schema" or "data model" is a collection of CREATE TABLE statements that prepares the RDBMS to accept data items.
For example, suppose that a Web site developer wished to record
subscribers to an email newsletter. He or she would create a new
table called mailing_list
with two
columns (also sometimes called "fields"), email
and name
, both text strings that can be up to 100
characters in length ("varchar(100)"):
create table mailing_list ( email varchar(100), name varchar(100) );
After a series of INSERT commands, the contents of this table may be viewed in a spreadsheet format, e.g.,
by typing a SELECT statement such as
name Philip Greenspun philg@mit.edu Bill Gates billg@microsoft.com Scott Adams scottadams@aol.com
select name, email from mailing_list
Each row in the table is also referred to as a
record. Note that, unlike with a desktop spreadsheet
application, every data item in the same column, e.g.,
email
, must be of the same data type (in this case a
character string).
II. What is a database application?
A database application is a computer program whose
primary purpose is entering information into and retrieving
information from a computer-managed database. Some of the earliest
database applications were accounting systems and airline reservation
systems such as SABRE,
developed between 1957 and 1960 by IBM and American Airlines.
Users interacted with early database applications, such as SABRE, by typing at a terminal connected directly to a mainframe computer running the database management system. The IBM 3270 terminal, introduced in 1972, was a very commonly used device. The terminal had no ability to process information, but merely displayed characters or "screens" sent from the mainframe.
Software development for an early database application was simply
software development for the mainframe computer, which executed all of
the program code centrally. Software was developed in assembly
language, a "second-generation language", or one of the
"third-generation languages" developed in the late 1950s and early
1960s, e.g., COBOL, Fortran, or PL/I. Whatever conventional
programming language was used, there may have been embedded database
commands in a specialized query language.
III. Client/server database applications
A client/server database application is a system in which the programs
to organize data and interact with users have been split up among a
central server and distributed personal computers. With the
development of the inexpensive personal computer in the 1980s, it did
not make sense to have a central mainframe do all of the
work. Programmers realized that substantial performance and
interactivity improvements could be achieved by distributing some of
the logic, especially software related to user interface, to machines
on users' desktops. The central computer (server) was
retained, but it ran only the database management system (DBMS). The
personal computers on users' desktops (clients) ran the
software that displayed menus and reports, drew graphs and charts, and
sent queries and transactions to the DBMS. Instead of screens and
keystrokes being communicated from the central computer to the
desktop, the desktop PC would send queries to the DBMS and receive
results back in exchange. As the rise of the desktop PC coincided with
the rise of the relational database management system (RDBMS),
conceived in 1970 and introduced commercially in 1979 by Oracle
Corporation, the most common language flowing between clients and
servers was SQL.
Rather than the IBM 3270 terminal protocol, the specifications for the screen or "page" to be displayed are sent in Hypertext Markup Language (HTML), which is a specification or format for a document (analogous to the format in which Microsoft Word, for example, might save a file to disk). The structured data in HTML is distinct from a programming language, which generally either express a step-by-step algorithm for a computer to follow or specify a computation to be performed.
Here is an example of a one-sentence .html page:
The source HTML behind this page is a file whose entire contents is the character string "Texas is really big." In other words, a plain-text string of characters can be rendered by a Web browser as an HTML document. Every word appears in the default font and face.
Suppose the author wishes to emphasize the word "really":
This is accomplished by surrounding the word with an "emphasize" tag: "Texas is <em>really</em> big." The Google Chrome browser, used to create the above screen shot, after reading the EM tag, elected to display the word "really" in italics. That's the markup part of HTML. A typical HTML document looks more like the following:
Here's the HTML behind the preceding page:
<html> <head> <title>Facts about Texas</title> </head> <body bgcolor=white text=black> <h2>Texas Facts</h2> <hr> Texas became a state in 1845. <h3>Geography</h3> Texas has an area of 268,820 square miles. <p> ... a new paragraph ... <h3>More</h3> <a href="http://en.wikipedia.org/wiki/Texas">Wikipedia article on Texas</a> </body> </html>
After the BODY, which represents the start of the document to be displayed, "Texas Facts" is marked up as an H2 headline. The HTML then specifies a horizontal rule (HR) across the page, includes a sentence about Texas's statehood, and then marks up the word "Geography" as a smaller H3 headline. At the bottom is a link to Wikipedia using the A tag. This is the hypertext part of HTML. As noted above, HTML is structured data representing a document. It is not a programming language and lacks the most basic capabilities for programming, e.g., it is not possible to instruct a computer to add two numbers via HTML. There is no "ADD" tag.
Here's the Perl program that generated the preceding page:
#!/usr/bin/perl
# the (above) first line says where to find the interpreter
# for the rest of the program
# any time we print, the output goes to the user's browser
print "Content-type: text/plain\n\n";
# now we have printed a header (plus two newlines) indicating that the
# document will be plain test; whatever else we write to standard
# output will show up on the user's screen
# Get the current time as a string.
$current_time = localtime();
# Run a program to get the system status.
$system_status = `/usr/bin/uptime`;
print "As of $current_time, system status is:\n\n$system_status\n";
Note that the output displayed by the browser is a combination of
static text, e.g., "As of" and "system status is:", and information
obtained or calculated by the running Perl program. All of this is
happening on the server; all that the browser knows is that it
received a stream of plain text characters (based on the "text/plain"
Content-type sent by the first line of code in the Perl program).
The structure of one of the page scripts shown in Figure 2 is best
explained by example. The flow chart below uses an example of a bank
customer logging in to view the most recent transactions to his or her
account. After typing in a username and password, the customer clicks
on a "view transactions" link. This causes the browser to request the
link "/transactions" from the Web server, which will run a page script
(short computer program) in response. The page script will connect to
the RDBMS, query the
A critical part of almost any SQL schema is a set of integrity
constraints. These are restrictions on the data that will be enforced
by the database management system. Without integrity constraints, a
company's valuable data may gradually become inconsistent as a
consequence of small mistakes in computer programs that use the
database. In the case of this example one-table data model, the
programmer has added two integrity constraints on the
or the programmer could add an additional table so that home, work,
and mobile numbers could be stored for each subscriber:
Which application pages must now be changed?
Examples of tools that enabled people to develop "screens" or "forms"
without using a standard programming language included Oracle Forms, dating back
to the early 1980s, and PowerBuilder, a popular tool for building
graphical user interfaces (GUIs) introduced in 1991. Generally these
products were called "Fourth Generation Languages" (4GL), to
distinguish them from "third-generation languages" such as COBOL and
Fortran. A more or less complete list of such languages and tools may
be found at http://en.wikipedia.org/wiki/4GL. Because
a standard relational database management system can be queried
programmatically to determine the names and structure of user-defined
tables, an early standard feature of these 4GL tools was the automatic
generation of basic data entry or retrieval forms and pages. E.g., if
a table had 10 columns and the 4GL programmer started to build a data
entry form, the 4GL tool would offer a menu asking which of the 10
columns should be included in the new form and then generate a
prototype. The human programmer could edit this automatically
generated code by, for example, adjusting the order in which fields
were presented or changing the length of an input box. Various
Microsoft tools, including Visual Studio (introduced in 1995, but
incorporating older software), included similar features and
popularized the term "wizard" to describe software that led a human
through a series of menus and eventually created a computer
program based on the choices made.
One advantage claimed for these 4GL tools was
"platform-independence". The PowerBuilder or Oracle Forms programmer,
for example, did not need to produce separate versions for Microsoft
Windows and Apple Macintosh computers. The tool developer, e.g.,
Oracle or Sybase, would write some operating system-specific code and
then the customer's 4GL program would execute on top of
that. Companies that installed a mid- or late-1990s release of the 4GL
tools would have an additional option of running their 4GL application
on a Web server. End-users would then interact with the application
through a standard Web browser making HTTP requests and receiving HTML
pages in return. Using this method, some companies were able to make
internal database applications work as Web applications without
writing any new software.
V. What is a Web-based database application?
The '226 patent's title includes the phrase "Web-based Database
Application". What does this mean? It is simply a combination of the
systems described above, i.e., "a computer program that can be used by
a user sitting in front of a standard Web browser whose primary
purpose is entering information into and retrieving information from a
computer-managed database". No additional software needs to be
installed by end-users. The computer programs on the server, however,
are modified so that they rely on a database management system (DBMS)
for storing and retrieval of information. The user requests pages by
URL, as before, and in response the server will run computer programs
(page scripts) that will generally access the DBMS and then merge the
results with a template in order to build up a complete HTML page to
return.
transactions
table, and then merge
the data items received back from the database with some fixed HTML (a
"template") in order to build a complete HTML page to return to the
customer. The final HTML page as viewed by the customer will contain
some boilerplate content from the HTML template, but also a list of
recent transactions that have been queried from the RDBMS.
VI. How were Web-based database applications built in the 1990s?
For a Web-based database application being built from scratch, the
classical development process included (and still includes, as of
2011) the following steps:
Note that Steps 5 and 6 will be repeated throughout the life of the application.
A. Steps 1 and 2 Example
Suppose that the programmer learned from the business manager that a
goal was to add a feature to the company's Web site whereby visitors
could sign up to receive email notifications of new products. The
result might be the development of the following SQL schema:
This is a SQL command that tells the RDBMS to make space for a new
table called create table mailing_list (
email varchar(100) not null primary key,
name varchar(100)
);
mailing_list
. This table will have two
columns, both variable length character strings.
This single table can be considered the application
database as it holds all of the data to be processed by the
application.
email
column: not null primary key
. The
not null
constraint will prevent any program from
inserting a row where name
is specified but
email
is not. The manager said that the company did not
want to require a full name, but needed the email address in order to
send out the announcements. The primary key
constraint
tells the database that this column's value can be used to uniquely
identify a row. That means the system will reject an attempt to insert
a row with the same e-mail address as an existing row and therefore
prevent customers from receiving duplicate emails. Note that a
real-world SQL schema could have additional constraints, e.g., that
submitted email addresses match a pattern that included an "@" in the
middle.
B. Step 3: Copy data into the RDBMS tables
If the company already has a spreadsheet containing customer names and
email addresses, the next step might be to copy those data
items into the new SQL table:
Note that in this example there is some information about the
structure and organization of the information in the spreadsheet. We
have data items that are pre-divided into rows and columns. This makes
it a much simpler task for a human to write a program to transfer
these data items into the corresponding columns of the
Philip Greenspun philg@mit.edu Bill Gates billg@microsoft.com Scott Adams scottadams@aol.com mailing_list
table.
C. Step 4: Write application pages (computer programs)
Finally the human programmer will write the application pages (page
scripts) that make it possible for end-users to interact with the
contents of the database. At least the following pages are likely to
be required:
Note that the structure and function of many of these pages are
predictable and flow more or less deterministically from the SQL table
definition. For example, the signup page will offer "email" and "name"
fields. If it offered a "phone number" field there would be no place
to store the entered data. Due to the predictability and repetitive
nature of writing the software to run in response to requests for
these URLs, a programmer even in the 1990s might rely on a variety of
automated and semi-automated tools, sometimes called "wizards", to
help in developing at least the skeletons of these pages. Typically
pages produced by a wizard would require editing by a
human programmer in order to create a functional and useful application.
mailing_list
table)
mailing_list
table)
D. Steps 5 and 6: Change the data model and application pages
Suppose that a manager asks "Wouldn't it be nice if we could also ask
folks for a phone number?" The programmer could add a column for up
to 20 characters (allows country codes and symbols such as "+" and
parentheses):
alter table mailing_list add (phone_number varchar(20));
create table phone_numbers (
email varchar(100) not null references mailing_list(email),
number_type varchar(15) check (number_type in ('work','home','cell','beeper')),
phone_number varchar(20) not null
);
In this example, as a consequence of a very simple change to the data
model (adding the ability to record phone numbers), 5 out of the 7
computer programs (in bold face above) that constitute the application
must be changed. Every change is conventionally done by a human
programmer opening a file in a text editor and typing, which creates a
risk of an error being introduced into what had been a perfectly
functioning application.
VII. Approaches to automatic generation of application code in the
1990s
Simultaneously with the rise of the RDBMS and the 1980s client/server
world, enterprises continued to ask "Can we cut the cost and time
required to develop database applications?" Expert programmers have
always been scarce and do not necessarily have the best understanding
of business requirements. Furthermore, it was observed that once the
SQL tables and columns had been defined, portions of the likely
structure of the application could be sketched out by a computer
program. If there was, for example, a table called CUSTOMERS with a
column CUSTOMER_NAME, there would need to be a form where a new
customer could be entered, including a field for CUSTOMER_NAME, and
also the application would need to include a form for editing the value
of that field, in the event that the customer was acquired or
otherwise changed its business name.
A. automatic code generation without using a client/server tool
PowerBuilder and similar systems from the client/server era saved a
lot of programmer time, but many of their features, e.g., the ability
to generate programs that would run on a desktop computer, weren't
necessary in the Web era. In 1999 a small group of programmers
building Sharenet, a knowledge sharing system for Siemens, made an
independent effort to reduce development time, revision time, and
errors introduced during revisions. The programmers built a complete
machine-readable specification of the application and then wrote a
program-to-write-the-program. The program-to-write-the-program read
the specification of what needed to be stored and how it was to be
presented to end-users and generated all of the computer programs (SQL
schema plus page scripts) for the application. When Siemens requested
substantial changes to the application following a test of the
prototype, the changes were made to the machine-readable specification
and the entire application was quickly regenerated. This process is
described in "Metadata (and
Automatic Code Generation)", a chapter within the MIT Press
textbook Software Engineering for Internet
Applications.
A more common approach taken by 1990s programmers who were given the task of reducing the amount of custom software development for individual applications was building computer programs whose behavior varied depending on information stored in the DBMS. For example, Viaweb, which became Yahoo! Store in 1998 and still exists under Yahoo! Small Business, offered multiple online stores, each for a different merchant, but served by a common set of computer programs and database. The programmers could predict that nearly every merchant would want to store, for each product, a name, a description, and a price. However, a camera shop might want to record and display "Lens Mount" for most of its products, reflecting whether a particular item was designed to fit onto a Nikon or Canon camera, for example. Yahoo! Store allowed a merchant to define a "custom field", which resulted in metadata (data about data) being recorded in the database and the interface for that particular merchant's customers to be updated to show Lens Mount. The same metadata would result in the administrative pages for the online store including options to set or update the lens mount field.
A later example of the "custom field" technique is photo.net's photo
sharing system, developed in the spring of 1999 as an MIT student
project (and still running today at http://photo.net/gallery/). More
information about the development of this system, including the SQL
schema that supports it, is available here.
The file system is a standard part of most operating systems
and allows named collections of information to be stored persistently,
e.g., on a hard disk drive, and then retrieved by name. The file
system on a server is no different than the file system on a desktop
computer. Furthermore, the file system on a separate physical machine
can typically be read from and written to just as easily as the local
file system. If a group of physical or virtual computers are
cooperating to deliver a Web application, they may share one file
system so that they can all load the same application pages and so
that information written by one computer can be read by another. In
that case, a separate computer, physical or virtual, would be
dedicated to running the file system and would be called the "file
server". A variety of cloud computing providers offer a file system
service; examples include Google Storage and Amazon S3. These free the
owners of the application from having to purchase and maintain server
hardware or worry about backing up information in a file system.
The database management system (DBMS) allows multiple
simultaneous updates to be handled in an orderly fashion and at the
same time as queries are responded to. Very commonly, the DBMS is a
performance bottleneck and therefore is often split off onto a
separate computer, physical or virtual. A "database server" requires a
combination of software and hardware (though the hardware could be
virtual). Examples of popular DBMS software include Microsoft SQL
Server, MySQL, and Oracle. Administering a DBMS can be challenging,
particularly because of the availability and reliability
required. Therefore some cloud computing services offer database
management systems, albeit with many fewer features than traditional
RDBMSes such as Oracle. Examples include the Google App Engine
Datastore and Amazon SimpleDB.
All of this software could run on any general-purpose computer running
a standard operating system such as Unix or Microsoft Windows. In the
early days of the Web, it was not uncommon for a programmer to host a
Web site on his or her desktop computer. The resulting site would be
accessible to others within a company (if the desktop machine were
connected to a local-area network) or Internet users around the world
(if the desktop machine were connected to the public Internet, a
common situation at universities). As Web sites grew in popularity, it
became common to use dedicated servers to respond to user
requests. Starting in the late 1990s, the virtual
machine IBM idea from the 1960s was resurrected. To achieve total
isolation between two Web applications, for example, each could run in
its own virtual machine on top of a single physical machine (i.e., a
physical computer). Initially this was done by the owners of physical
machines for their own purposes; today it is possible to rent
virtual machines from a number of different "cloud computing"
providers. A heavily used application, such as Facebook, will require
multiple computers (physical or virtual) to handle all of the
requests. (http://www.datacenterknowledge.com/the-facebook-data-center-faq-page-2/
quotes a Facebook employee saying that "When Facebook first began with
a small group of people using it and no photos or videos to display,
the entire service could run on a single server" and then notes that
"as of June 2010 the company was running at least 60,000 servers")
VIII. Hardware and software required to host a Web-based database application
The software required to host a typical Web-based database application
includes the following:
The HTTP server program processes requests from browsers and
delivers responses to those requests. A response could be the contents
of a file, a "not found" error message, or the output of running a
computer program. Popular examples of HTTP servers are Apache and
Microsoft Internet Information Server, and generally an efficient
means for running short computing programs in response to Web requests
(the old CGI standard is functional, but not efficient). Examples of
additional software that might be loaded into the HTTP server include
a PHP processor or Microsoft Active Server Pages.
IX. U.S. Patent 6,832,226 (the "'226 Patent")
Going back to the six steps involved in a hand-authored Web-based database application:
The set of computer programs referred to by the '226 Patent would
eliminate most or all of this labor and all of the required technical
training. See Col. 1:39-41: "development of web-based database
applications without needing a programmer or web developer;" see also
Col. 1:44-46: "the ability of a user to change the look and feel of
the application pages without needing a programmer or web
developer".
"The first byte of each page seems to be a type identifier, for instance, the first page in the mdb file is 0x00, which no other page seems to share." "In Jet4, an additional four byte field was added. It's purpose is currently unknown." ... (more)
The challenge starts in a similar manner to that confronted by the archaeologists who uncovered the first tablets in Linear A, the written language of the Minoans around 1800-1450 B.C. They had obtained a stream of symbols, but didn't know what any of them meant and, if they were phonetic, did not know any words in the Minoan language. [Despite more than 100 years of effort, Linear A remains undeciphered to this day.] A computer program that receives a stream of 1s and 0s must first decide whether it is a stream of 64-bit "words" for a PlayStation 3 Cell processor, a stream of 32-bit "words" for an older Intel Pentium chip, a stream of 8-bit ASCII characters, or a stream of 4-bit "nibbles" holding binary-coded decimal digits.
Suppose that Program 2 of the '226 Patent is able to determine that the bits in an uploaded file represent 100 decimal digits. That's tremendous progress over "a string of 1s and 0s", but many interpretations of these 100 digits remain. The uploaded file could be a complete 100-digit number. It could and almost surely would correspond to 100 digits within the infinite sequence of digits of Pi. It could be ten 10-digit phone numbers. The '226 patent refers to a program that can figure out, from among the infinite number of possibilities for interpreting these 100 decimal digits, what the numbers represent, pick a name for the database table to store this information, divide the information into rows and columns, pick one or more column names, and pick data types, e.g., "number" or "date", for each column.
A simpler concrete example is the application data "7567075703". Here are some possibilities for interpreting these digits:
phone_numbers
with one column, whole_number
, of type char(10)
.
zip_codes
with one column, five_digit_code
,
of type char(5)
; two rows (or "records") will be inserted
into this table, one for each zip code
inventory
with two columns:
product_id char(3), quantity integer
.
important_dates
with one column,
unix_timestamp
, of type integer
.
The '226 patent then populates a computer-readable specification of everything relevant to the new application. This specification, similar in concept to that used for the 1998 Siemens Sharenet project described above, is itself stored in database tables called in the patent the "data dictionary tables" (Figure 11). The data dictionary tables have the same names and structure (number and type of columns) for all applications. Each application, however, will insert rows with different values into this structure. In other words, these are the same tables, but populated differently. The '226 patent provides editing facilities for a human to adjust or change some of the work done by the automated program in populating the data dictionary tables. Each change by a human from one of the forms in Figure 9E through 9AI will result in updating a value in a data dictionary table or possibly adding or deleting a row.
Once the data dictionary is established, the information from the data dictionary is read and used to build a SQL schema or data model for the application itself. The data dictionary was metadata ("data about data"). Before an application can function, however, there must be tables that can store data items themselves. A "third program" that can accomplish this task is referred to in Col. 2:44-45. The "third program" receives an additional three lines of description in Col. 5:22-24.
Once the data dictionary is updated, the SQL schema for the
application database is changed (6:31) and "the application's web
pages are regenerated" (6:30). The "web pages" referred to here are
computer programs that have the ability to send queries to the
application database and that ultimately generate the HTML that is
rendered by the end-user's browser. Thus a layer of
dynamism is added compared to the conventional Web-based database
application. The conventional application includes computer programs
that run in response to page requests, thus making it possible for
end-users to receive a Web page with up-to-date information just
inserted into the database. If the database structure is changed, a
human programmer must be hired to update the conventional application
page scripts. By contrast, the computer programs built and maintained
by the system of the '226 Patent are themselves dynamic and will be
changed automatically when necessary.
If the programs referred to in the '226 Patent worked, they would
substantially automate nearly every step of the software development
process for a database application. Given a stream of data items,
these programs would infer the structure and organization of the data
and design the SQL schema or data model for an application built
around the data. The programs would fill the newly created application
tables with existing data and generate the computer programs ("page
scripts" or "application pages") necessary for users to work with the
completed database from their Web browsers. Finally, the programs
would keep the complete application up to date as user requirements
change.
X. Summary
Database applications have been among the most useful and important
applications of computer systems since the late 1950s. A Web interface
to a database application enables use by people scattered around the
globe, connecting from desktops, laptops, and smartphones without the
need to install additional software. Developing database applications,
with and without Web interfaces, has been the life's work of hundreds
of thousands of programmers worldwide and has required the skills of
understanding business requirements, predicting user behavior, and
creative software design.