for the Web Tools Review by Philip Greenspun
May 2005 Update: I'm not actively maintaining this page anymore but I did just recently try a product called "Word to Web" (WordToWeb 2.5) to see if it would product less complex and hard-to-edit-manually HTML than Word 2003 does on its own. The answer is "no". In fact, it did a much worse job than Microsoft Word by itself.
The Academic version of Office comes only on floppy disk; 32 of them. It took me about two hours and 120 megabytes of disk space to install the complete package.
I started Word 6.0.1, the new fixed zippy version of Word that was supposed to address the program's sluggishness on the Macintosh. PowerMac-native Word 6 ran substantially slower than Word 5 ran in emulation on my PowerMac.
"OK, so it is a major pig," I thought to myself, "and proves once again that C programs beyond a certain complexity are neither fast nor small. Still, it will be nice to be able to Save As Text with Layout without crashing the machine."
I tried saving a simple 10-page paper with no figures or equations or anything special (beyond two footnotes) as Text with Layout. An error box appeared. No output file was produced. My machine crashed a few seconds later.
"Oh well, so two years and all the C programmers Bill Gates could bring in from India weren't enough to squash this bug. At least I can play with Internet Assistant."
It turns out that Internet Assistant only runs on the PC version of Word.
Did I feel like I'd been cheated out of $135? Hell no! Aficionados of viewgraph design claim that Aldus Persuasion is way better, but PowerPoint produced a nice stack of colored viewgraphs for a conference talk. Oh, this PowerMac native C program is real zippy. On a 66 Mhz PowerMac, it was barely able to keep up with my typing in certain modes.
As soon as I had rolled up my prayer rug, though, I noticed that the HTML output from Word/Internet Assistant was garbage. For example, it would start to wrap a headline in an H2 tag but then forget to close it, so huge blocks of text were rendered as a headline. There were hundreds of extraneous PRE, BR, and P tags. Worse than useless.
"At least I can go back to my old way of doing things," I mumbled to myself, "I'll just Save As RTF and then use rtftohtml." No such luck. The latest version of Word, at least with Internet Assistant installed, puts some crud in RTF files that rtftohtml doesn't understand.
perl -i.bak -pe "s//'/g"should fix it up. Less easily fixed are equations. I tried translating my pathetic master's thesis which has a lot of equations in the text. These all ended up mangled. Despite these nits, HTML Transit is by far the best tool available for translating Microsoft Word into HTML. It also understands Interleaf, Frame, Word Perfect, RTF and a bunch of other formats (not Pagemaker though). Unfortunately, it only runs on Windows.
The document was filled with special 8-bit ASCII characters that aren't part of the legal HTML character set, e.g., "smart quotes" and long dashes. There was an extra "<P> </P>" in between all of the paragraphs.
<B><FONT FACE="Arial" SIZE=4><P>
In short, a terrible unusable non-standard mess reflecting a complete ignorance of the original point of HTML (that the browser does the formatting).
... Well, I've tried Frame 5.5 now. It includes an impressive array of controls for converting documents to HTML and is way better than the plug-in that Adobe distributed with 5.0. I think that the philosophy and overall power are similar to what you get with InfoAccess. One truly impressive thing that Frame will do is simultaneously output a cascading style sheet so that the final HTML stays reasonably clean yet readers with modern browsers can get the benefits of design choices you've made. (Note: I wasn't able to finish the on-line edition of Database Backed Web Sites because Macmillan didn't supply me with something that would load cleanly into Frame and display. But whatever I saw in Frame was ultimately viewable in the final HTML document.)
The bottom line on Frame is that it remains an extremely powerful way to manage a set of documents that you intend to make available simultaneously in print, PDF, and HTML. But in order to get the most out of Frame, you'll have to invest a few days thinking about styles and what they should mean. Frame is a good enough piece of software that there are actually rewards to taking an intelligent and formal approach to your problem. But if you want to be stupid, you can think of Frame as a version of Microsoft Word with most of the bugs taken out.