This for the Internet app nerds among the readers…
The student teams in 6.171, the software engineering class that I’m teaching this semester at MIT, are required to document their servers. By the end of the term, they are supposed to have something more or less like http://philip.greenspun.com/doc/ (doc dir for my personal Web site; rather bloated because it is based on a toolkit that is much more powerful than the features I’m using). I would have expected them to write their documents in HTML. One team has chosen to do their documentation in LaTex output to PDF. I personally hate it when information is only available in PDF, but can’t really say why. They don’t need equations or anything fancy. HTML would suit them fine, but they apparently find it easier to write in LaTex.
I always think that if a Web developer can’t write HTML by hand in his or her sleep that he or she probably isn’t very good. So the use of Microsoft Word or some other tool to author is a telltale sign of incompetence. Is this just prejudice? On what grounds can I tell these folks that a Web site should be documented in HTML?
[Fun experiment: Do a Google search for “latex” and compare the pages returned with the ads on the side…]
[Update after a few days of reflection: I think I finally figured it out. The first and most obvious answer is that documentation for Web systems need a lot of hyperlinks, and therefore HTML is better than PDF. The deeper answer is that the students don’t realize that they are supposed to be software engineers and not students. The student turns in a paper. It will never be updated. LaTex is great for that, as it was designed for journal papers that were never updated, except maybe by the author. Internet applications are fluid, however, and they get updated frequently, which requires corresponding frequent updates to the documentation. The students who decided to use LaTex are implementing their service in Microsoft ASP.NET. Eventually it will be taken over by some Microsoft certified programmer. Even supposing that they provide the LaTex source (right now they just have the PDF on the server), what are the chances that this person will have heard of LaTex or know how to use it? By contrast, if you document a Web service in HTML, you know that whoever takes it over will be able to edit it because nobody ignorant of HTML would be touching a Web service. (Of course, if the HTML was originally authored in Microsoft Word, the person needing to edit it would curse you because there would be so much extraneous garbage to wade through.)
So… LaTex/PDF good for a student turning in an assignment. Hand-authored HTML good for documentation that you expect some future programmer to take over and edit.]
Sure, you can use HTML for documentation and it requires relatively little overhead if you’re already making something website-related. However, HTML also sucks for any sort of long form document. You get exactly zero of the things you’d want to use to manage a piece of documentation (cross referencing, citation, glossary, indexing, etc) that LaTeX has spent years building up. Even something like the DocBook seems to lack a lot of the automated toolchain that you take for granted in LaTeX. If the PDF thing bothers you so much, make them use LaTeX2HTML or something.
There is a LaTeX to HTML converter, isn’t there?
Yes, it’s called TeX4ht.
How about a simple functional requirement like: “The documentation should be in a format that is easy to view and navigate in IE or Firefox without requiring the use of third party plug-ins…”
I’m a little biased, having gone through the MIT system before the web was the defacto publishing standard. But I despise how ugly HTML documentation is. Aesthetically, I really prefer documentation that’s created by a real layout system, and for geeks, LaTeX qualifies. I used to typeset everything that had equations in LaTeX, because it produces unrivaled clarity for equations. My biggest complaint about LaTeX is that it’s difficult to customize, so you end up with every paper, thesis, problem-set looking identical unless the author puts in a lot of work to change defaults. Also, embedded images can be ugly if care isn’t taken to set them nicely in the flow.
Regarding PDF as a publishing medium, I don’t mind it too much for something that doesn’t have a lot of links. Self-contained documents work fine in PDF. They are easily searchable in modern readers, and you can even select text out of them, etc. They usually print beautifully for those occassions that printing is called for. As a Mac OS X user, PDFs are a very native format, which makes them as fast as text files to render and read.
So I’d say there are benefits to both, but it might be a mistake to discount the LaTeX/PDF combo. Sometimes it’s the right tool for the job.
Byron: They are not writing documents longer than 5 pages or so.
Actually, I think I figured out to some extent why I am against LaTex/PDF. I think the docs should contain hyperlinks to directories on the server and documentation elsewhere on the Web. I guess it is possible to have hyperlinks from PDF files to arbitrary URLs, but it doesn’t seem to be commonly or smoothly done.
Perhaps Jakob Nielsen’s Alertbox “PDF: Unfit for Human Consumption” (http://www.useit.com/alertbox/20030714.html) will explain why you, and so many others, hate it when documents are only available in PDF.
But, regarding your question: good web developers should be able to write HTML in their sleep. That doesn’t necessarily making writing HTML directly the best choice. They should be allowed to use any tools that result in a quality web site containing their documentation. Emacs, Doxygen, latex with an HTML backend, custom scripts extracting documentation from their source code, whatever.
Developer beware, though: MS Word usually generates nasty, nasty HTML, and with HTML converted from latex, the problem is usually getting the content divided properly into a sensible set of pages.
Under 5 pages, needs hyperlinks; why is this even under discussion? I love LaTeX, but sometimes it’s not the right tool for the job. Yes, you can use it to create PDFs with hyperlinks, but it’s not completely trivial.
Anyway, you’re talking about five pages here. That’s, what, ten minutes worth of work to reformat? Demand HTML, and if they want to also present you with a beautifully bound edition for your collection, accept it graciously.
But I bet you they’ll just use documentclass{article} and Computer Modern (Knuth’s one great mistake). If you can’t be bothered to typeset something properly, best not to typeset it at all.
I think LaTex is a great way to approach it. You basically end up with a good cross-media solution. You’ll have a pretty PDF, and the LaTex can be parsed into HTML by any one of the many available scripts suited for the task.
Have they considered DocBook?
I would prefer HTML documentation, particularly handwritten HTML. I’d also prefer the original Mona Lisa on my wall instead of a print. The difference is that, unlike paintings, in the case of documentation handwritten HTML is ultimately easier for the author, not harder.
Do they want to do LaTeX/PDF because they’re trying to develop another skill set? I have been trying to teach myself a similar thing, using Adobe InDesign and a Wacom tablet to take notes in medical school. It is entirely too time-consuming and I’m right back to pencil and paper. I learned a skill, but the overhead time cost was not sustainable because the work (getting the information into my head) flowrate plateaued at an unacceptably low level. In other words, as I got better with InDesign I realized even a skilled user can only produce at a certain rate, which I estimate to be five times slower than pencil and paper.
After all, LaTeX is a scientific (mathematical) typesetting system, not just “another tool” like
MS Word. So, if you teach a university course, let them write in LaTeX. I think the knowlegde required to write LaTeX (in contrast to the one required to write in Word) is quite comparable, if not larger than
the one required to write HTML. To really judge whether those students are smart, have a look at the LaTeX source, however.
I can’t believe how much the quantity AND quality of our documentation has risen since we started storing it in a Wiki (any old one will do, the simpler the better I reckon). It’s even simpler than HTML, is searchable with zero effort, it keeps track of revisions and most of all it is very easy to update, thus making it more likely to be kept up-to-date.
The difference, surely, is that LaTeX is designed to be easy to type, and HTML isn’t (having to type out closing tags in full, all those fiddly foo=”bar” etc.). Of course, documentation is supposed to be designed for the reader, and so HTML probably is the better choice here.
Having a requirement for hyperlinked HTML seems perfectly reasonable for this sort of thing. This doesn’t prohibit them from using LaTeX but from my experience generating reasonably nice HTML documentation from LaTeX would be more work than just writing the thing in HTML in the first place if the doc is only five pages.
PDFs suck for viewing on screen. They are great if you want to package printable materials for other people’s consumption, but on screen they are a great pain. I’ve felt comfortable with PDFs on the screen exactly twice: Once when I was reading a document printed in landscape mode, as my laptop’s screen is larger than a piece of paper, and once when I was reading a PDF on a very large LCD that was rotated into the portrait mode.
Which brings me to another point: Somebody needs to produce a simple word processor that uses MHTML as its native format. 99 percent of the documents I send or receive are never printed out, so Word’s fancy capabilities are wasted on me. The only reason why I still use it is because most WYSIWYG HTML editors still insist on keeping images in separate documents, and are generally geared towards websites, not documents.
What else can I say?
LaTeX2HTML works fine and output HTML from LaTeX, which is what you want (apparently). I’d personnaly rather write in LaTeX or DocBook XML that in HTML straight just because these “formats” provides a stricter structuring than HTML.
you can write html/css with without any compilers.
thats a beauty. Isn’t it?
i noticed that latex has better fonts than those bundled with windows and is more often choosen over word by those that want to include many mathematic formulas because of the claimed productivity for doing so. plus when they want their work printed.
with without = without
I’ve come to not mind PDF, perhaps because much Web content tends to be brief, random thoughts with lots of links, and the few things I see in PDF tend to be papers written in well-organized prose which doen’t encourage you to keep popping other windows open. But 5 pages of documentation is a different story. And I used to find PDF’s somewhat irritating.
LaTex is a great choice because they can regenerate the output in as many formats as you need it, including xhtml. If they are using something like LyX it will keep them from doing stupid stuff with the markup and force them to concentrate on the content.
Your question is, “On what grounds can I tell these folks that a Web site should be documented in HTML?”
Because HTML is the lingua franca of the Web. It is the native format of the Web.
LaTeX’s native environment is a printed piece of paper.
If the student wants something that can be easily repurposed, there is always XML or rows in an RDBMS.
More practical reasons:
-You can’t link into a specific page or text chunk in a PDF file. Or, if you can, this is an esoteric and rarely-used feature.
-HTML is much easier to full text index than PDF. In fact, only the rocket scientists at Google seem to have figured out how.
-HTML loads much faster than PDF, encouraging much more frequent consultation with the documentation
But I’m just a Web user who happens to hack together several Web applications, not a Software Engineer who deigns to write for the Web.
I actually prefer docs to be available easily in both HTML and PDF. As I get older, I don’t like the eyestrain of reading manuals on monitors all the time, and prefer to print docs. (I suppose HTML + @media print CSS would do fine, but PDF docs really are sharper.) Anyway, I used docbook on an earlier project, and liked its output to both HTML and PDF. The thing I think is most objectionable about LaTeX is that it’s definitely a formatting-biased markup, rather than a structure-biased markup. At least docbook is structure-based. I would have to come down against LaTeX too, in this case.
The strongest grounds in favor of HTML are in the usability arena, as noted above. The documentation should contain links out, and links back up to the table of contents. I’d require that.
If the students want to knock themselves out to make this work in some other format, I guess that’s okay, as long as long as they don’t come complaining that they didn’t have enough time to finish the other parts of the assignment.
Is it normal to tow a car with oxen?
Although print-oriented documentation is fine and dandy, it goes against the obvious point of the class. It’s not that difficult to write HTML. If one finds that typing tags is annoying, it’s not that difficult to use or write tools like Markdown. They’re certainly not going the extra step one would expect from students at MIT.
Philip: Oh, in that case why didn’t they just use TROFF? 😉 I like the Wiki idea though. Not quite as nice with regards to managing bits-n-pieces, but less verbose/annoying that pure (X)HTML+CSS with reasonable results.
Manni (and others from flashing through the comments): TeX isn’t really a markup language at all, its a (relatively simple) programming language that happens to drive a typesetter. LaTeX, the most common package, happens to sort of look like a markup language what with the begin and end statements everywhere, but it isn’t really. Most importantly you can’t actually parse a LaTeX document and necessarily know what will happen without “executing” it.
Philip, I can’t believe you’re really “against” the use of LaTeX. That’s like saying you object to use of high level languages for programming. Why not also demand that your students write their apps in assembly language? If you can’t construct a good argument against the _result_ (i.e. the user experience of the end product) of their work with LaTeX, then your anti-LaTeX stance hasn’t a leg to stand on. I’ll stand elbow-to-elbow with you on the anti-PDF picket line, and I doubt anyone would object to a class requirement to demonstrate competency with HTML, but requiring your students not to use the tools with which they’re most productive for things which don’t necessarily require HTML finesse? Let’s also get out the sticks and stone axes. 🙂
For what it’s worth, I agree with the idea that if the documentation is expected to be less than 5 pages, contain hyper links, have no need for fancy equations, and need to be viewable in the same browser as the rest of the website HTML is the right tool for the job.
It sounds like you are not asking the students to write a thesis, or any paper which will end up published in the dead-tree world but website documentation which will need to be easily viewable on-line from within the same environment as the website they are documenting, so require the work to be in a format that meets these requirements. I am not saying that they need to write all the HTML by hand, as there are plenty of tool available to convert from La Tex into HTML (a few of which have been mentioned in these comments already.)
Finally my personal angle…I hate clicking on a link in a website that opens a PDF. Loosing control of my computer while the PDF viewer loads, is totally unacceptable, personally I think that the place for PDF docs on a website is to store papers which require that format such as technical papers, documentation which is expected to be printed, etc… basically use PDF when you want the user to print the document, and HTML when you want the user to view it on-line. The thought of clicking on a link which claims to be website documentation and then loosing control of my computer until the PDF viewer is finished with it is unacceptable and is enough to stop me from using one site in favor of another where the developer was willing to spend a little extra time to create the documentation in a way that lets me read it while still doing the 5 other tasks I am trying to perform with my computer. The point is, think of the users of the site, are they going to want/require the documentation in PDF or will they get as mad as I do while waiting for the PDF viewer to load.
So in summary, I vote to make the requirement HTML. Turning the final work in as a PDF is like using a screw-driver to pound in a nail, it’s the perfect tool to do a job…just not this one.
+1 on letting them use a wiki. A couple open source projects I’m on switched to doing docs completely in wiki, and the resulting increase in documentation is incredible. In many ways wiki is what html was _meant_ to be, a very simple mark up language that anyone can use to make a web page.
As for the original question, there is a _bit_ of grounds for prejudice against an html generation tool, since if people are using different generators then diffs in version control won’t align right. But letting them go latex -> html with a generator seems valid, as long as the team is aligned on everyone using latex, storing the sources in version control, and having the build always generate new html docs from the latex sources. Latex -> pdf seems silly, as it’s just annoying on the web. Yes, you can do links, but for me it opens them in ie, my non-default browser.
Okay, I finally figured it out and updated the main posting. PDF is bad because it can’t be edited. Computer programs change. Internet applications change even more rapidly than other computer programmers. A native HTML document means that the programmer taking it over and updating it will have the source code. Any programmer who is working on a Web site will know HTML. That’s why all docs have to be in HTML (except maybe some figures/images).
I am not sure if I agree that either hand-crafted html or hand-crafted LaTeX is the best approach when documenting code. The inevitable problem arises as the code base gradually diverges from the documentation until they soon end up describing quite different things.
I am personally in favour of using tools like Doxygen which covers many, but not all programming paradigms. Can we assume these students are working on openACS using tcl? I love tcl to death, but Doxygen may not be too helpful here. Perhaps it does suggest the need for an openACS documentation tool which would allow students (and “real” developers) to document their code in a formal way which would allow consistent documentation to be “generated” in either LaTeX, pdf, html (or even odt) depending on the requirement.
Bob: Certainly none of the students are using AOLserver, Tcl, or ACS. I doubt that any of them have heard of any of those systems. The students are 21 years old. The Doxygen system you mention does not seem to cover the kind of information I want them to write, which is generally much higher level and includes an explanation of why they wrote some code in the first place. http://philip.greenspun.com/doc/chat is an example of what I’d like to see for a module, for example.
I’d like to second the suggestion for using Markdown. You get a readable plain-text version of whatever you’re writing for free, and it saves gobs of time, and typing for example backquotes instead of >code< all the time invites lazy types like myself to add rich formatting.
We’re using the Instiki wiki, and it’s working beautifully. One of the reasons people don’t document enough is that it’s inconvenient. The wiki reduces resistance. It’s also easy for our non-technical staff to use, and they do need to document their work and processes. The wiki gives them a searchable, hyper-linked, versioned system that’s about as easy to use as a word processor.
For more elaborate organization we’re looking at Moin, which offers embedded macros with which you can bring chunks of text together, generate local TOCs, etc. We’re also looking at Trac, which is a wiki-based front-end to Subversion. The Ruby Rails developers use Trac, and it’s written in Python. So maybe we all *can* get along, after all. (Nah.)
One advantage of hand-crafted HTML over Word or text documents is that it feels like programming. A programmer hates writing documentation – but maintaining a program is something else again. A wiki is even worse – it feels like using a version control system. Programmers hate version control even worse than documentation. At least you can ignore documentation – but a version control system usually lies right in the path of getting the job done. 🙂
I think that you should consider forcing them to (1) write their documentation in strict, semantic XHTML, (2) format the document using CSS, and (3) publish an RSS feed that lets users know when their content has changed. The students will learn a lot by doing this and they can write converters from XHTML into any format their hearts desire. Ultimately, being facile with XHTML, CSS, and RSS will be a lot more useful to them than becoming experts at LaTeX.
And a class award for best documentation might help motivate the teams.