Discussion

part of Software Engineering for Internet Applications by Eve Andersson, Philip Greenspun, and Andrew Grumet; revised February 2005
A discussion forum is one of the most basic tools for computer-supported cooperation among human beings. User A can post a question. User B can post an answer. User C can view both question and answer and learn from the exchange. In a threaded forum, User D has the choice of posting a response to User A's question or to User B's response. In a Q&A format forum, Users D, E, and F can post responses to User A's question, and the responses will simply be presented in the order that they were submitted. With minor tweaks to the presentation layer, a discussion forum system can function as a personal commentable weblog.

In this chapter you'll prototype a discussion forum, conduct a usability test, and then refine your system based on what you learned from observing the users.

Discussion Forum as Community?

A well-designed discussion forum can by itself fulfill all of the requirements for a sustainable online learning community. Recall that these elements are the following:
  1. magnet content authored by experts
  2. means of collaboration
  3. powerful facilities for browsing and searching both magnet content and contributed content
  4. means of delegation of moderation
  5. means of identifying members who are imposing an undue burden on the community and ways of changing their behavior and/or excluding them from the community without them realizing it
  6. means of software extension by community members themselves
Aviation in itself is not inherently dangerous. But to an even greater degree than the sea, it is terribly unforgiving of any carelessness, incapacity or neglect.
-- Captain A. G. Lamplugh, 1930s
An early example of the forum-as-community is USENET, which was started in 1979 and is also known to old people as "NetNews" and to young people as "Google Groups". Each newsgroup is a more or less self-contained community of people interested in a particular topic, collaborating through a threaded discussion forum. A good example is rec.aviation.soaring, where people talk about flying around in airplanes without engines.

In a USENET group the magnet content can be any longish posting from a recognized expert. Keep in mind that the number of people using a group such as rec.aviation.soaring is fairly small—most people get nervous in little planes and even more nervous in a little plane with no engine. An analysis of October 2004's activity by Marc Smith's Netscan service (netscan.research.microsoft.com) shows that the group had only 174 "Returnees". Thus it will be fairly straightforward for these core users to recognize each other by name or email address. A typical magnet content posting in a newsgroup is the FAQ or frequently asked questions summary in which each question has an agreed-upon-by-the-group-experts answer.
If the engine stops for any reason, you are due to tumble, and that's all there is to it!
— Clyde Cessna

The means of collaboration in the USENET group is the ability for any member to start a new thread or reply to a message within an existing thread. In the early days of USENET, the means of browsing and searching were reasonably good for recent messages, but terrible or non-existent for learning from older exchanges. Starting in the mid-1990s, Web-based search engines such as DejaNews provided fast and easy access to old messages.

USENET has traditionally been weak on the fourth required element ("means of delegation of moderation"). Not enough people have volunteered to moderate, software to divide the effort of moderating a single forum among multiple moderators was non-existent, and the news protocols had security holes that let commercial spam messages through even on moderated groups. For an overview of the circa 2001 state of the art, read http://www.landfield.com/usenet/moderators/handbook/. For a discussion of spam in history, see "Origin of the term 'spam' to mean net abuse" by Brad Templeton at http://www.templetons.com/brad/spamterm.html, a site that contains a lot of other interesting articles on the history of Internet.
Flying is inherently dangerous. We like to gloss that over with clever rhetoric and comforting statistics, but these facts remain: gravity is constant and powerful, and speed kills. In combination, they are particularly destructive.
— Dan Manningham

Where USENET has fallen tragically short is element 5: "Means of excluding burdensome people." Most USENET clients include "bozo filters" that enable an individual user to filter out messages from a persistently troublesome poster. But there is no collective way for a group to exclude a person who consistently starts irrelevant threads, spams the group, abuses others, or otherwise becomes unwelcome.

With regard to element 6, software extension by community members themselves, USENET has done remarkably well. USENET servers and clients tend to be monolithic C programs where small modifications can have catastrophic consequences. On the other hand, the average user of the early Internet was a skilled software developer. So if not every USENET user was a programmer of USENET tools, it was at least safe to say that every programmer of USENET tools was a user of USENET.

Beyond USENET

If the online learning community that you build is only as good as USENET, congratulate yourself. The Google USENET archive contains 700 million messages from twenty years. Hundreds of thousands of people have gotten the answers to their questions, as shown in Figure 8.1.

.

Figure 8.1: A December 25, 2001 USENET exchange in the group rec.aviation.soaring regarding mounting a camera on the wing of a glider. Notice that the first answer comes less than two hours after the question was posted.

When building our own database-backed discussion forum system, there are some simple improvements that we can add over the traditional USENET system:

More dramatic improvements can be obtained with attention to element 5: "Means of excluding burdensome people." Your software can do the SQL query "show me users who've submitted questions that were deleted by a moderator as redundant" and then automatically welcome those users back to the forum with an interstitial page explaining how to search and browse archived threads. If the online community is short on moderator time, it will make a lot of sense to query for those users whose postings have resulted in moderator intervention. If it turns out that 0.1 percent of the users consume 50 percent of the moderators' time, perhaps it is better to ban those handful of users and thereby double the community's available moderation resources.

As the semester proceeds, you'll discover another advantage of building your own discussion forum, which is that it becomes an integrated part of your service. All of a user's contributions in different areas, including the discussion forum, are queryable from a single database and viewable on a single page.

Exercise 1

Visit five sites on the public Internet with discussion forums, one of which can be the Medium Format Digest forum at photo.net (http://www.photo.net/bboard/q-and-a?topic_id=35). For each site gather the following statistics: List the user interface and customer service features that you think are the best from these five sites and give a brief explanation of why each feature is good.

One Forum or Many?

I certainly had no feeling for harmony, and Schoenberg thought that that would make it impossible for me to write music. He said, 'You'll come to a wall you won't be able to get through.' So I said, 'I'll beat my head against that wall.'
—John Cage
How many forums should a site have? Let's consider a site for music lovers. Would one forum be enough? Maybe not. Will the classical music lovers be interested in a discussion of Pat Boone's cover of AC-DC's "It's a Long Way to the Top (If You Wanna Rock 'N' Roll)"? So it will be a good idea to split the discussion into at least two forums: Classical and Pop. Let's say that a Pat Boone fan comes into the Pop forum one day and encounters a discussion of the lyrics from Ice Cube's Death Certificate or an MP3 from Prodigy's Fat of the Land? We'll clearly need to split up the Pop forum into Christian Pop, Techno, and Rap. We're expecting a lot of Beatles fans as well. Which of these forums would they gravitate toward? Maybe we need a '60s Rock forum. On the classical side there are a lot of grand opera nuts who won't want to be distracted by discussions about authentic instrument performances of Baroque music. Sophisticated modern music fans discussing John Cage's "Four Minutes, Thirty-three Seconds" won't want to waste time discussing the fossils of the 18th and 19th Centuries. And if we turn our attention to the many styles of Jazz ...
If something is boring after two minutes, try it for four. If still boring, then eight. Then sixteen. Then thirty-two. Eventually one discovers that it is not boring at all.
—John Cage

It would be easy to justify the creation of 100 separate forums on our music site. And indeed USENET contains more than 50 rec.music.* groups, including rec.music.beatles.moderated, for example. That turns out to be the tip of the iceberg, for the alternative hierarchy sports more than 700 alt.music.* groups , including alt.music.celine-dion and alt.music.j-s-bach. If USENET can support nearly 1000 discussion forums, surely a popular comprehensive music site ought to have at least 100.

Maybe not.
She had a voice like the New Jersey State Anthem played on an electric razor.
Bright Lights, Big City by Jay McInerney

When discussion is fragmented, it is hard for a community to get off the ground. If there are 50 users and 100 forums, how will those users find each other? The average visit will result in a user concluding that the community isn't active. Such a user is unlikely to return or refer a friend to the site. Even when a community is large enough to support numerous forums, presenting discussion in a fragmented manner leads to extra work for the user whose interests are diverse. Suppose that a music scholar comes to USENET looking to see if there has been any recent discussion of Bach's "Schubler Chorales" and their influence on later composers. That's as simple as visiting alt.music.j-s-bach. If that scholar wants to check up on recent postings concerning Celine Dion's "My Heart Will Go On", he or she will have to scan alt.music.celine-dion separately.

A good example of a thriving community with a single discussion forum is slashdot.org. It is very easy to find the topics being actively discussed on slashdot: look at the front page.

It is possible to take the "one forum" and "many forum" approaches on the same site at the same time. For example, look at http://www.photo.net/bboard/ (static copy at http://philip.greenspun.com/seia/images-discussion/photonet-bboard-original.htm ). There are separate Medium Format, Nature Photography, and Photo Critique forums. For a user to browse the new postings in these three forums will require seven mouse clicks: down into this page, down into Medium Format, back, down into Nature, back, down into Critique. With a different SQL query, however, postings from all these very same forums can be combined on one page, as in http://www.photo.net/bboard/unified/ (static copy at http://philip.greenspun.com/seia/images-discussion/photonet-bboard-unified.htm). Postings from particular forum topics may be distinguished with a special publisher-chosen color or icon. Suppose that the user finds the Photo Critique forum overwhelming and uninteresting. These postings can be excluded from his or her personalized unified view via clicking on the "Customize forums" link at the top (static copy at http://philip.greenspun.com/seia/images-discussion/unified-forum-personalization.htm) and unchecking those forums that are no longer of interest.

Exercise 2: Design the User Experience

Figure out whether your service should have one forum, one forum with categories, several forums, several forums each with categories, or something else. Document the page flow for your users (recall the example page flow diagram from the "User Registration" chapter).

Exercise 3: Document the Data Model

Document how you intend to spread the discussion forum data among the content repository tables that you defined in the "Content Management" chapter.

Exercise 4: Build the User Pages

Implement the user experience that you designed in Exercise 2.

Exercise 5: Build the Admin Pages

Design a set of admin pages. In this case it is usually better to start with a required list of tasks that must be accomplished. Then try to build a page flow that will let the administrator accomplish those tasks in as few clicks as possible.

Recall from the "User Registration" chapter an important user interface principle to keep in mind: it is more natural for most computer users to pick the noun first and then the verb. For example, the forum moderator might first click on a message's subject line to select it and then, on a subsequent page, select an action to perform to this message: delete, approve, rate, categorize, etc. It is technically feasible to build a system in which the moderator is first asked "Would you like to delete some messages?" and then is prompted for the messages to be deleted. However, this is not how the Apple Macintosh was designed, and therefore anyone who has used the Macintosh user interface or its derivatives, notably Microsoft Windows, will be accustomed to the noun-verb order.

This is your community and these are your users. So in the long run only you can know what administrative actions are most needed. At a minimum, however, you should support the following:

In-Class Presentations

At this point we recommend that teams present their functioning discussion forum implementations. So that the audience can evaluate the workability of the interface, the forums should be preloaded with questions and answers of realistic length, with material copied from Google Groups if necessary.

A suggested outline for the presentation is the following:

The presentation should be accompanied by a handout that shows (a) the data model that supports discussion, (b) any SQL code invoked by the URL that displays one thread of discussion (pulled out of whatever imperative language scripts it is imbedded in), and (c) the results of the query trace.

Usability

At this point your discussion forum should work. Users can register. Users can ask questions. Users can post answers. Is it usable? Well, consider that most computer programs were considered perfect at one time by their creator(s). It is only in encounters with real users that most problems become evident.

You could be a user yourself.  Men's room interior.  Singapore It is an offense not to flush the toilet after use.  Men's room interior.  Singapore


These encounters between freshly minted Internet applications and first users have become increasingly startling for all parties. One reason is the large and growing user experience gap. In 1994 the average Web user was a researcher with a Unix machine on his or her desk. Very likely the user knew how to write at least simple computer programs. The average Web page was straight HTML 2.0 with no scripts or other active components. All Web pages worked the same: you read the black text, you clicked on the blue text, you were reminded by the purple text that you'd already visited a link. Once you learned how to use your first Web site you knew how to use all subsequently visited sites.

The user experience gap has grown larger because the users are less sophisticated while the applications have grown more complex. In 2005 the average Web user is a first-time computer user and the Web browser may be the only application that he or she knows how to use. Despite the manifest inability of these users to cope with a complex user interface, Web sites have been tarted up with JavaScript, ActiveX, Java, Flash, to the point where they are as hard to use and different from each other as old Unix applications. Users unable or unwilling to deal with the horrors of custom user interfaces have voted with their mice. They buy at Amazon. They search at Google. They get their information from Yahoo! and nytimes.com.


Figure 8.2: As the Internet gets older, applications become more complex and difficult to use while the average user becomes less and less experienced. Source: Mark Hurst, www.goodexperience.com.



Idiosyncratic ideas make sense for magazine and television advertisements. Different is good when it takes the user the same 30 seconds to absorb the message. But different is bad if it means the user needs extra time or extra clicks to get to the desired task. Some studies show that on each extra click there is a 50 percent chance that a user will abandon the site altogether.
As an aid to deciding whether to spend your future as an engineer or go on to business school, note that Webvan CEO George Shaheen ran the company into the ground, then resigned shortly before the bankruptcy filing, collecting a $375,000-per-year for life retirement package.
In mid-2000, Webvan purchased HomeGrocer, a competing grocery delivery company, and converted the old HomeGrocer users to the new Webvan user interface. Orders fell by more than half. The HomeGrocer business went from breaking even to losing lots of cash simply because of the inferior usability of the Webvan software. Ultimately Webvan went bankrupt, taking with it $1.2 billion of invested cash.

How is it possible that people follow what they imagine to be their own good taste instead of either copying the successful Internet services (e.g., Yahoo!, Amazon, Google) or listening to the users? And that people continue to believe in the value of their own ideas even as the red ink starts to dominate their financial reports? Justin Kruger and David Dunning, experimental psychologists at Cornell University, wondered the same thing and wrote up their findings in "Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments" (Journal of Personality and Social Psychology; Vol 77, No. 6, pp 1121-1134; http://www.phule.net/mirrors/unskilled-and-unaware.html). Kruger and Dunning found that people in the 12th percentile of skill estimated themselves to be in the 62nd. Furthermore, these incompetent people failed to recalibrate themselves when shown the range of performance by their peer group. The authors concluded that "those with limited knowledge in a domain suffer a dual burden: Not only do they reach mistaken conclusions and make regrettable errors, but their incompetence robs them of the ability to realize it."





Figure 8.3: Source: "Why You Only Need to Test With 5 Users" by Jakob Nielsen; http://www.useit.com/alertbox/20000319.html



Exercise 6: The Usability Test

A scientist is someone who measures her results against Nature. An engineer is someone who measures her results against human needs. A computer scientist is someone who doesn't measure his results.
— us
An ideal usability test involves the following elements:
  1. a test subject whose experience with computers and the Internet is comparable to what you expect for your average user
  2. a set of tasks that you want the subject to try to accomplish
  3. a quiet comfortable environment for the test subject
  4. no assistance from the product developers
  5. observation of the test subject through a one-way mirror
  6. videotaping of the test subject's experience for later study
Conduct a usability test of your discussion forum software, incorporating elements 1-4 from the list above. You should find at least four testers from among your friends—do not pick anyone who is taking this course (classmates will have too many subconscious expectations). Run your usability test subjects in series, one after the other, with your entire team observing and writing down what happens. Ask your subjects to voice their thoughts aloud. How long does it take the subject to complete a task? Does the subject get stuck on any step? Does the subject indicate confusion as to the appropriate next step at any time?

Use the following script of tasks (cut and paste these into a separate document and print it out, after filling in the bracketed sections), with no extra hints:

  1. starting as an unregistered user at the site home page, find the area on the site where one would ask questions of other users (if you can't accomplish this task, or any other task on this page, within 3 minutes, give up and move on)
  2. read through the existing questions and answers to determine whether or not [some question that has been asked already] has been asked and answered already; if not, post a question on that subject (registering if necessary)
  3. read through the existing questions and answers to determine whether or not [some question that has been not been asked already] has been asked and answered already; if not, post a question on that subject
  4. log out
  5. log in with the existing username/password of [user/pass] and try to find all the unanswered questions in the discussion forum
  6. answer the question(s) that you yourself posted a few minutes earlier, pretending to be this other user
  7. log out
  8. log in with the existing username/password of [admin username/password] and find the administrator's pages
  9. delete the discussion forum thread(s) that you created earlier
  10. log out
In between test subjects, clean up any rows that they may have left in database tables. If your first subject has a disastrous experience, consider taking a few hours off to fix your software, add links and annotation, etc., before proceeding with the second subject.

Stand as far away from the subject as you possibly can while still being able to see the computer screen and hear the subject's comments. Force yourself to remain absolutely silent. If the subject is completely confused and clicking around randomly, let the subject continue until he or she figures it out. Keep track of the number of seconds each subject requires to complete each task.

Post a report on your team server at /doc/testing/discussion-usability. This report will contain a summary of what you learned from this test with average task times and average total time (we can use these to compare the efficiency of various teams' solutions). The report should contain hyperlinks to sub-pages that contain transcripts of individual user sessions, what each test subject said, and what happened. Link to your report from your main documentation index page.

Discussion for Education

Recall from the introduction that our goal in working through this text is to build an online learning community. An active discussion forum might be evidence of a tremendous amount of member-to-member education or it could merely be a place where loudmouths enjoy seeing their name in print. Moderation is the first line of defense against postings that aren't responsive to the original question or helpful to the would-be learners.

Building more structure into a discussion forum is an option worth considering, especially if your discussion forum is supporting an organized class. The Berkman Center at Harvard Law School (HLS) was a pioneer in this area. The teachers at HLS weren't happy with the bias in favor of early responders inherent in a standard discussion forum system. The first response to a question gets the most readers because it is near the top of the page, so it might be more ego-gratifying to be first than to spend more time crafting a thoughtful response. This shortcoming was addressed by writing what they call a semi-synchronous discussion forum. Responses are collected for a period of time, but not made public until the deadline for responses is reached. The system is called the Rotisserie.

An additional capability of the Rotisserie is the ability to randomly assign participants to respond to postings. For example, every student in a class will be required to post an essay in response to a question. After a deadline lapses, those essays are made public. The Rotisserie then assigns to each participant the task of responding to a particular essay. Every student must write an essay. Every essay gets a response. A particularly good or controversial essay might get additional responses. A particularly loudmouthed participant might elect to respond to many essays.

See http://h2o.law.harvard.edu for more information about the Rotisserie, to try it out, or to download the software.

Suppose that your online learning community is more open and fluid. You can't insist that particular people respond at all or that people respond on any kind of schedule. Is there anything that can be done with software to help ensure that all questions get answered appropriately? Yes! Build server-mediated mentoring.

Server-mediated mentoring requires, at a minimum, two things: (1) a mechanism for novice members (mentees) to be connected with more experienced members (mentors), and (2) asking people who post questions whether or not their question has been adequately answered. To make the service as effective as possible, you'll probably want to add at least the following: (3) automated reminders from the server to mentors who have left mentees hanging, and (4) rewards, rankings, and distinguishing typography to recognize community members who are answering a lot of questions and mentoring a lot of novices.

Imagine the following interaction:

How can you estimate the effort required in building the full user experience example? Start by looking at the number of new tables and columns that you'd be adding to the system and the number of new URLs to which the server would be responding. Then try to find a subsystem that you've already built for this project with a similar number of tables and page scripts. The implementation effort should be comparable.

Let's start with the data model first. To support requests for and assignment of mentors, you'll need at least one table, mentor_mentee_map with the following columns: mentee, mentor (NULL, if not assigned), date_of_request, date_of_assignment, mentee_goal. To support the query "who is the currently connected member mentoring" and build the workspace subsection page for Jane, you'll want to add an index on the mentor column. To support the query "are there any mentors who should be notified about a message posted by a member", you would add an index on the mentee column. If you were to make this a concatenated index on mentee, mentor, it would help the database identify outstanding requests for mentors (mentor is NULL) efficiently for the "be a mentor page".

Attempting to support the open/closed question status display and the query "Which members have answered a lot of questions well?" might make you regret some of the data model decisions that you made in the preceding exercises and/or in the "Content Management" chapter exercises. In the "Content Management" chapter we have a headline asking "What is Different about Discussion?" above the suggestion that the content_raw table can be used to support forum questions and answers. If you went down that route and were implementing the mentoring user experience, this is where discussion would diverge a bit from the rest of the content on the site. You need a way to represent in the database management system whether a discussion forum question is open or closed. If you add a discussion_forum_question_status column to the content_raw table you'll have a NULL column value whenever the content item is not a discussion forum question. That's not very clean. You may also be adding a closed_question_p boolean column to indicate that a forum posting had been identified by the original questioner as having answered the question. This will be NULL for more than 99 percent of content items. That's not a storage efficiency problem, but it is sort of ugly.

An alternative to adding columns is to build some sort of bag-on-the-side table recording which questions are open and closed and which answers closed them. To decide whether or not this is a reasonable approach, it is worth starting by asking "In what percentage of queries will the helper table need to be JOINed in?" When presenting articles and comments, you wouldn't need the table. When presenting the discussion forum to a public user, i.e., someone who wasn't logged in, the discussion forum page scripts wouldn't need the table data. You might need these data only when serving workspace pages to members and when serving an individual discussion forum thread to a logged-in member. It might be worth considering a table of the following form:

-- content_id is the primary key here; it is possible to have at most
-- one row in this table for a row in the content_raw table

create table discussion_question_status (
       content_id   not null primary key references content_raw,
       status       varchar(10) check (status in ('open', 'closed')),
       -- if the question is closed the next column will contain
       -- the content_id of the posting that closed it
       closed_by    references content_raw
);

-- make it fast to figure out whether a posting closed a question
create index discussion_question_status_by_closed_by on
discussion_question_status(closed_by);

As the community gains experience with this system, it will probably eventually want to give greater prominence to responses from members with a history of writing good answers. In a fully normalized data model, for each answer displayed, the server would have to count up the number of old answers from the author and query the discussion_question_status table to figure out what percentage of those were marked as closing the question. In practice, you'd probably want to maintain a denormalized metric as an extra column or columns in the users table, perhaps columns for n_answers_posted and n_answers_closing, counts maintained by nightly batch updates or database triggers.

Supporting the "initially show only to my mentor" option for new content would require the addition of a show_only_to_mentor column to the content_raw table, where it could be used for discussion forum postings, comments on articles, and any other content item. Rather than changing all of the pages that use the content tables it would be easier to update the SQL views that those tables use, e.g., articles_approved, so as to exclude content that should be shown only to a mentor.

Some new page scripts would be required, at least the following:

Modifications would likely be required to the following pages:

For the purposes of this course, you need not implement all of these grand ideas, and indeed some of them don't make sense when a community is just getting started because the number of members is so small. If, however, some of these ideas strike you as interesting consider adding them to your project implementation plan.

Exercise 7: Refinement Plan

Prepare a plan for how you're going to improve your discussion forum system, including any changes to data model, page flow, navigation links, page layout, annotation (help text), etc. Place this plan on your team server at /doc/planning/YYYYMMDD-discussion. (If you name files with year-month-day in the beginning, they will sort in order of creation.)

Exercise 8: Client Signoff

Ask your client to visit the discussion forum user and admin pages. Ask your client to review your usability test results and refinement plan. This is a good chance to impress your client with the soundness of your methodology. If your client responds via email, make that your answer to this exercise. If your client responds orally, make notes from that conversation your answer.

Exercise 9: Execute

After consultation with your teaching assistant, execute your planned improvements.

Time and Motion

One programmer who has mastered the basics of Web/db scripting can usually whip out a basic question-and-answer forum in 8 hours. The team together will need to spend about one hour preparing a good in-class presentation. The team together will generally require 3 hours to conduct and write up the user test. Talking to the client and refining the forum will generally take at least as long as the initial development effort.
Return to Table of Contents

eve@eveandersson.com, philg@mit.edu, aegrumet@mit.edu