Chapter 8: Java and Shockwave-The <BLINK> Tag Writ Largeby Philip Greenspun, part of Database-backed Web Sites
Note: this chapter really suffers on the Web. Macmillan did some lovely napkin drawings that you will only be able to see if you buy a copy on real dead trees. I hope to some day convert these for the Web but I'm too busy at present.
I'm hoping that you didn't buy this book because you thought it would teach you Java. But on the off chance that you did, I'm going to start this chapter by writing down everything that I know about Java. If you already know Java, then you'll probably want to skip over this section.
Java is a computer language. To understand whether it is a good or bad computer language, you have to ask, "Why do I need a computer language?" After all, if you were to buy a bare Pentium processor chip from Intel, it would come with a book listing valid instructions for the Pentium, such as "add two numbers together," "subtract 34 from the number in Register %eax," or "jump to address #457 if the result of the last subtraction was 0." It is a bit tough to see how adding up several million of these instructions would turn into a program like Netscape Navigator, Adobe PhotoShop, or the Oracle relational database management system (see Figure 8-1).
Figure 8-1: Computer languages used by programmer are designed to be readable by humans. It is inefficient to build computer hardware that understands human-readable instructions. A hardware processor understands simple machine codes such as "load memory location #4576 into Register %eax." The Pentium processor used in this example understands i386 machine code. A computer program called a compiler translates human-readable languages such as C and Java into i386 machine code. This method of getting programs to run has two principal drawbacks. Drawback 1: every time the programmer wants to test a change, he or she has to rerun the compiler. This slows down software development. Drawback 2: the output of the compiler is not typically portable. It will only run on a Pentium. It may only run on a Pentium with Windows NT 4.0. Every time you run a program in an interpreted language such as Tcl, you send the human-readable code to an interpreter. This works around both drawbacks but results in rather sluggish execution. Java "the system" allows the programmer to compile to Java interpreter byte codes. Testing changes still requires a tedious recompilation, so we haven't escaped Drawback 1. However, the output of the compiler is portable to any computer running the Java virtual machine.
John Locke was apparently having this same difficulty back in 1690 when he wrote "An Essay Concerning Human Understanding:"
"The acts of the mind, where it exerts its power over simple ideas, are chiefly these three: 1. Combining several simple ideas into one compound one, and thus all complex ideas are made. 2. The second is bringing two ideas, whether simple or complex, together, and setting them by one another so as to take a view of them at once, without uniting them into one, by which it gets all its ideas of relations. 3. The third is separating them from all other ideas that accompany them in their real existence: this is called abstraction, and thus all its general ideas are made."
[Note: Before you congratulate me on my literacy, let me admit that I stole this quote from the only good book on computer programming that I've ever read, Structure and Interpretation of Computer Programs (Abelson and Sussman; MIT Press, 1996).]
One does not program in i386 machine code because it lacks powerful means of combination, by which compound expressions are built from simpler ones, and means of abstraction, by which compound objects can be named and manipulated as units. Without means of combination and means of abstraction, a big computer program would never be understandable to a human.
Over the past four decades, many high-level computer languages have been developed. The most powerful of these languages is Lisp, invented at MIT in 1959 and today used only by people who are either trying to solve extremely difficult problems-the best layout for a big integrated circuit ("micro chip"), for example-and paradoxically by people who don't think of themselves as programmers at all (draftsmen who are adding shortcuts to AutoCAD).
One of Lisp's best features is that it allows the programmer to think of the computer as having infinite memory. Interactive programs have to create and destroy objects all of the time; for example, a word processor has to store a new paragraph typed by the user but then delete it if the user changes his mind. Lisp provides functions to create data structures but none to delete them. This would work great if you could go down to the computer store and bring home a physical machine with infinite memory. Since you can't, Lisp incorporates a garbage collection system that tracks down data structures that are no longer being used. This space is then scavenged and made available so that the running program can continue to behave as though the memory were infinite.
The most popular computer language of the early 1990s was C. A C programmer is supposed to track every object he has created and explicitly identify which ones are no longer needed. If 20 programmers are working together on a big system, they all have to coordinate with each other on the storage allocation scheme. If one of the 20 programmers makes a mistake, the same memory location ends up being used by two subroutines and the program begins to return erroneous results. Usually the program will crash eventually, bringing down the entire machine in the case of Macintosh and non-NT Windows operating systems.
The "perennially crashing C program problem" was attacked in different ways. Big corporations attacked it by moving their critical programs into safe languages, notably SQL, the declarative language of their relational database management systems. Users attacked it by saving their work every five minutes. Computer language nerds attacked it by inventing new computer languages.
Java is one of those new languages. As a language per se, it doesn't look like much of a revolution. Java has some of the most important features of Lisp, like automatic storage management and an object system. But so do dozens of these other new computer languages. And so, for that matter, did Lisp implementations of the 1970s. If people wanted to write software and compile it into machine code, they'd be a lot better off writing Common Lisp. So there must be more to Java.
Java is an interpreter. Lisp could run as an interpreted language back in 1959 so there must be something else.
Java is an interpreter installed on almost every Internet user's desktop. You can write a Java program, compile it to Java interpreter byte codes, and then it will run on any computer with a standard Web browser installed (see Figure 8-2). This is the most interesting thing about Java. MS/DOS was not an innovative system. Yet it was worth programming for because it was so widely installed. Java will be a lot more widely installed than MS/DOS.
Figure 8-2: Java applets are written by a programmer working on a development machine. He or she types in Java source code and compiles it to produce an applet in Java byte code. This applet is then FTP'd to a public Web server and linked into Web pages. The Web server machine is just distributing bytes and need not run any special software. Users loading Web pages from the server will be able to run the applet as long as their browser incorporates a Java virtual machine. The programmer need never be aware of the user's choice of computer or operating system.
It is true that most of the hyped techno-advantages of Java were available to thoughtful programmers in 1960. However, Java contains a few genuinely new ideas. The most important of these is the byte-code interpreter's security system. The Java team really thought about the implications of grabbing code off an untrusted network and running it alongside trusted software with access to private information.
There are three components to Java virtual machine security:
The Verifier looks through the downloaded instructions to make sure that they can't do anything illegal. Most of this checking is type inference: look at the input data types, look at the operations performed on that data, and make sure that the results are of legal types.
The Class Loader is responsible for grabbing Java binary classes from the network and managing their interaction. It is possible to implement a Class Loader that partitions interactions among Java code. For example, if you were building a Web browser, you might want to prevent classes loaded from different sites from interacting with each other.
The Security Manager handles requests by Java byte code for system resources such as local files, network ports, and input/output hardware. If you were using a Java interpreter to run a company network with company software, you'd probably want a Security Manager with very loose policies. Java code ought to be able to read and write the local disk, talk to any other computer on the company network, and display output to the user. If you were writing a Web browser to run Java code downloaded from foreign sites, you'd want a very unforthcoming Security Manager. Access to local files will be forbidden. Access to the network, except to talk to the IP address from which the applet was downloaded, will be forbidden.
How well does all this security stuff work? Not so well that Java security holes don't periodically make the front page of the New York Times. These aren't holes in the fundamental scheme, though. They are problems with Sun's implementation of the Verifier or Netscape's implementation of the Class Loader and Security Manager. Unlike half-baked systems like Microsoft's ActiveX, Java security is fundamentally sound.
Another new feature in Java is multithreading. There have been multithreaded Lisps for decades. You can even exploit multithreading in AOLserver Tcl scripts thanks to Doug and Jim's featureful API. But Java puts multithreading into the language spec so your code is portable among implementations and operating systems. Virtually every Internet application requires multithreading. One thread talks to the network, one thread listens for user input, one thread drives the sound card, one thread manages windows and menus.
Oh yes, there is one little problem with developing Java applets: they will inevitably crash the user's browser.
The Java PR literature explains on every third page how Java will finally save you from all those buggy C programs. Unlike C, Java doesn't let those cubicle dwelling drones allocate storage and manipulate pointers. Unlike C, Java doesn't screw up arithmetic when you move a program from one computer to another. C sucks. Java rules.
And what language did these bold pioneers use to implement the Java virtual machine?
Of course, there are moderately reliable C programs. The Unix operating system, for example. It has only taken 25 calendar years and several hundred thousand programmer-years to get most of the bugs out.
Here's my capsule history of software reliability . . .
In the 1970s, programmers wrote safe programs on top of a reliable high-level substrate. They were using Lisp, a language that was produced by some of the best programmers in the world for their own use. Bad programmers often wrote reliable programs and great programmers pushed back the frontiers.
In the 1980s, programmers wrote unsafe programs on top of a reliable low-level substrate. They were using C, a low-level language that doesn't have much of a run-time system. Assuming the CPU hardware was functioning correctly, a great C programmer could write a program that would not crash. Of course, most C programmers either weren't so great or were attempting to build complex systems. The programs crashed and sometimes took the user's whole computer with them (with a Macintosh or Windows, for example). Even the best programmers were unable to do more than write copies of systems that had been built in safe languages during the 60s and 70s.
We're back to the 1970s and a safe language (Java) but this time we're using an unreliable high-level substrate (the Java virtual machine). Thus no matter how skilled the Java programmer, no Java program can ever run reliably. It is only a matter of time before the Java applet that you publish will crash the user's Web browser. Unless he is using Windows NT or Unix, there is a fair chance that your Java applet will crash his entire computer.
There is enough money behind Java that I suppose someone will eventually write a reliable virtual machine and window system. Just don't hold your breath.
To write Java applets for your Web site, you need something to translate your Java source code, designed to be human-readable, into Java byte codes that will be understandable by the interpreter running inside your users' browsers. In more humble times, you'd use a compiler. Now you really need a development environment. If there is a mistake in your program, the development environment will show you where it is in the source rather than barfing up a cryptic error message.
As of February 1997, the consensus among my Java nerd friends is that Symantec Café (http://www.symantec.com) is the best development environment. They also cautioned me against using any of the special Symantec classes, which are allegedly slow and buggy. Symantec also produces dbANYWHERE, the currently favored technique of getting Java on Windows machines to talk to a relational database.
As for documentation, there are plenty of Java books out there. Some of them hit the streets even before the applet classes were finalized. Unfortunately, it would seem that it takes a bit longer to write a clear language tutorial than it does to make an insta-book biography of Marcia Clark. My friend Sean tells everyone to buy Teach Yourself Java in 21 Days (Perkins and Lemay; Sams, 1996), but supposedly this is obsolete advice. Folks on the Net seem to like The Java Tutorial, available in hypertext at http://www.javasoft.com/nav/read/Tutorial/index.html or as an 831-page pile of dead trees (Campione and Walrath; Addison-Wesley, 1996). If you want technical information right from the horse's mouth, then you need The Java Language Specification (Gosling, Joy, and Steele; Addison-Wesley, 1997). Guy Steele was a driving force behind Lisp standardized Lisp back in the 1980s so he knows something about computer languages.
I like O'Reilly books. They don't have enough money to flood the world with advertising. They don't have enough MBAs to realize that time-to-market is more important than quality. I'd start with Exploring Java (Niemeyer and Peck; O'Reilly, 1996), a 400-page tutorial. Once you've read all of that, you can keep Java Language Reference (Grand; O'Reilly, 1997) next to your 3M Precise Mousing Surface (like a mouse pad, but good). If you heeded the rest of my advice in this chapter, you are probably using Java for a network application, in which case you'll want Java Network Programming (Harold; O'Reilly, 1997). If on the other hand, you've gone over to the dark side and just want to simultaneously tickle every multimedia device on your readers' machines, then you need Java Threads (Oaks & Wong; O'Reilly, 1997).
You should learn Java. I predict that it will gradually supplant C over the next ten years. Java is going to be big. You heard it here first.
"Maybe If I Make Stuff Flash I'll Get More Traffic. Well, It Is Easier Than Writing Content Anyway."
I talked to a glass blowing artist when he was planning his Web site. He wanted to hire the slickest Web design firm in Boston. "My work is visual. I have to have an amazingly good-looking Web page," Tony said.
How is anyone going to find your good-looking site? AltaVista doesn't recognize images and Java applets and graphic design. It only indexes text. For a lot less money, you could buy the online rights to an interesting book on glass blowing. Then anytime anyone typed the query "glass blowing" into a search engine, they'd get to your site and you could sell them glassware.
"But I want my site to be really cool," Tony responded.
If you are trying to demonstrate the workings of a mechanism, show the steps of an algorithm, or just have some fun with a strip show, then animation can be useful. But try to be careful that you aren't just spending a lot of time and money turning your pages into visual irritations. Also remember that until Java implementations stabilize, you are always running the risk of crashing your users' browsers and, with Macintosh and non-NT Windows operating systems, their entire machines.
It is finally worth remembering what brought users to the Web in the first place: control and depth. Software like Java and Shockwave enables you to lead users around by the nose. Flash them a graphic here, play them a sound there, roll the credits, and so on. But is that really why they came to your site? If they want to be passive, how come they aren't watching TV or going to a lecture?
It seems like an obvious point, but I mention it because I've seen so many tools to convert PowerPoint presentations into Web sites. The whole point of a ViewGraph-based presentation is that you have a room full of people all of whose thoughts have to be herded in a common direction by the speaker. Ideas are condensed to the barest bones because there is such limited time and space available and because the speaker is going to embroider them. The whole point of the Web is that each reader finds his own path through a site. There is unlimited time and space for topics in which the reader has a burning interest.
Despite the flaws in today's Java implementations, the idea of moving programs around the network was great in the 1970s and is still great. If nothing else, it is a technology that promises to free us from a lot of system administration pain. You paid for a powerful processor on your desktop so you ought to be able to run lots of programs. Yet you don't really want to be responsible for installing all those programs, making sure that your operating system version is compatible, and upgrading to newer versions of the application.
Client-side Java can make a good site great in the following situations:
A richer user interface is always harder to learn. Your readers don't want to learn how to use new programs. They already learned how to use a Web browser and probably also word processing, spreadsheet, and drawing programs. However, it is possible that you can come up with a Java applet that delivers such a great benefits that people will invest in learning your user interface.
Suppose I'm doing a mundane camera ownership survey. If the user owns a point-and-shoot camera, it doesn't make sense to ask which accessory lenses and flashes he owns. However, if he says, "I have a Nikon 8008," then I'd like to present a list of Nikon flash model numbers as options. I can do this now by essentially asking one question per HTML form. The user says, "I own an SLR (Single Lens Reflex)," then submits that form, "I own a Nikon" then submits that form, "I own an 8008" then submits that form, and my server finally generates the appropriate accessory flash form.
With a Java applet, the user's choice on the P&S/SLR menu will affect the choices available on the camera brand menu which in turn will affect the choices available on the camera model menu. Is this better? It will take longer to download. Not only do you have to send the user Java byte codes but also all the text that you have in your database of camera models, only a small portion of which will ever be presented. On the plus side, the user can work in another window while the applet is downloading and, once loaded, the applet is much more responsive than a succession of forms.
Some of the user-interface devices on a computer are just not well-suited to the stateless request-response HTTP protocol. Even a continuous network connection might not be good enough unless the Web server and Web client are physically close to each other. Examples of user-interface devices that require real-time response are the mouse, the tablet, and the joystick.
You would not want to use a drawing tool that needed to go out to the network to add a line. An HTML forms-based game might be fun for your brain but it probably won't have the visceral excitement of Doom. Any thing remotely like a video game requires code executing on the user's local processor.
Obvious candidates for Java include things like stock tickers and newsfeeds. The user can launch an applet that spends the whole day connected to a quote or headline server and then scrolls text around the screen. Though obvious, these are applications where Java isn't essential. The information provider could just as easily have written a "client-pull" HTML document by adding the following element to the HEAD:
<META HTTP-EQUIV=REFRESH CONTENT="60; URL=update.cgi">
The user's browser will fetch "update.cgi" 60 seconds after grabbing the page with this element.
The need for Java is a bit more pressing in a chat system. You want a new posting to be immediately transmitted to all the participants.
A chat system? A reimplementation of America Online? Is that really all we can do with Java?
Medical records are fragmented. If a patient is treated in five hospitals, he will have five separate electronic medical records. That's understandable. What is unexpected is that even if a patient stays in one hospital, he will generally have several electronic medical records. Each department in a hospital generates and stores its own data to some extent. Whether and to what extent these departmental databases interoperate is determined by technology choices, the network infrastructure, and management politics.
Suppose that you are a programmer given the task of making all the data on one patient available to authorized personnel anywhere inside the hospital. This includes demographics data from the MIS shop, laboratory results, and real-time waveforms from machines in the intensive care unit. The traditional solution would be to design a monolithic computer system, presumably running some kind of relational database management software, into which all departments were required to contribute data. Then you'd budget to develop custom client software to run on all the different kinds of computer around the hospital.
A traditional solution of this form takes a long time to develop. Often it takes so long that the completed system does not fulfill the organization's goals, not because of bugs in the system, but because those goals have changed. Since a huge computer system is extremely expensive to develop, this leaves everyone with a bitter taste in their mouths and reluctant to start over on the next big system.
Anyone with an MBA these days can say "Let's build an intranet!" What would that mean in the hospital? First and most obviously, you abandon the idea of writing custom client software. The user interface to the system will be a Web browser, perhaps sprinkled with Java applets when standard HTML is inappropriate. Now the hospital does not have to worry about Mac/PC/Unix compatibility. Nor will there be any work associated with client computer operating system upgrades. If a department wants to change from Windows 3.1 to Windows NT 4.0 then they just need to get Netscape running; they don't need to ask you to recompile your client software.
Another aspect of intranets is that one doesn't build custom protocols. If you have data to offer, install a Web server and maybe a few API or CGI scripts to make the data available via Hypertext Transfer Protocol (HTTP). If a program needs data then it should get it by making an HTTP GET request from the appropriate server. AOLserver, Apache, NCSA, and Netscape are all better-engineered general purpose servers than you are going to be able to write with your meager programming resources. Stick to worrying about the content that is transmitted in response to the GET.
In our hospital example, this means that the departments don't have to dump their data into a central Oracle database and then rely on that. Each department can continue to use their current system. This flexibility has enormous operational and political appeal. All each department need do is install a Web server to make its data available to other departments. In the case of the MIS department, this is very easy. They can just read the RDBMS-backed Web site chapters in this book and will be up and running a few days later. The intensive care unit, however, has to install a Windows NT or Unix machine next to all the monitoring instruments. This computer will run a Web server and some custom software to grab real-time data from the instruments and make it available via HTTP. A client that wants this data can just say "GET /heart-monitor-stream.tcl?monitor-id=icu2317 HTTP/1.0" and the server will deliver the waveforms as it receives them from the monitor.
Is there anything left for you to do? All the data in the hospital is now available by HTTP and all the client machines are running Netscape. A doctor anywhere in the hospital can walk over to a Netscape client machine and find out how Joe Schlebotnik's heart is doing. That is, assuming the first thing that comes to the doctor's mind when he thinks of Joe is "http://icuweb/heart-monitor-stream.tcl?monitor-id=icu2317". Oops. I guess you can't go home yet. You still have to build a patient-centered Web server somewhere. A doctor can query this server to ask "What's the story with Joe?" and the patient-centered server queries all the hospital departments to put together a Joe Schlebotnik Web site on the fly.
Here again you want to use standard Web technology. It would be painful, tedious, and bug-prone to write a custom program to take the streaming data from the ICU and pass it into a Java applet running inside the doctor's Netscape Navigator. So instead you install a flexible Web server program on your conglomeration server and write a small Tcl or Java program to run inside its API. Figure 8.3 shows how the pieces fit together.
A hospital intranet using a conglomeration server
Is this practical today? Just barely. My friend Zak and I have been building medical Web/database systems like this at Boston's Children's Hospital and MIT since about 1994. Children's runs a 60GB Oracle installation which made it very easy to get interesting data. Even the first prototypes that we kludged together in Oraperl CGI scripts worked better than a lot of the custom Mac and Windows apps that medical information systems vendors were selling.
Zak's latest thinking in this area involves using the Netscape Enterprise Server as the conglomeration server. His group has added a thin layer of Java to the Enterprise Server for taking in real-time data streams and then serving them back out. In contrast to the unreliable client-side Java implementations with their bug-ridden window systems, server-side side Java seems quite stable.
You can check out the system at http://w3health.com/
My final comment is a cautionary one. Don't get too excited by the possibility of offering a rich custom user interface with Java. Adobe PhotoShop has a beautiful user interface but it took them hundreds of person-years to perfect it. It takes them hundreds of person-years to test each new version. It costs them millions of dollars to write documentation and prepare tutorials. It takes users hours to learn how to use the program. You don't have a huge staff of programmers to concentrate on a single application. You don't have a full-time quality assurance staff. You don't have a budget for writing documentation and tutorials. Even if you did have all of those things, your users don't have extra hours to spend learning how to use the Web site that you build. Either they are experienced Web users and they want something that works like other sites or they are naïve users who will want their effort in learning to use Netscape and your site to pay off when they visit other sites.
Does that mean I'm suggesting that your Web site be 100% static? Au contraire! Read the next chapter to find out how to write server-side programs.
Note: If you like this book you can move on to Chapter 9.