Star
Streak.  Ancient Bristlecone Pine Forest (around 10,500 feet above sea
level).  California.

Solar Magnitude Forum

designed by Julie Melton and Philip Greenspun, updated July 2007

Every online community starts with at least one discussion forum. As the community grows, people eventually decide that there is too much traffic in that one forum and the discussion should be split by topic. Eventually you might end up with something like http://www.photo.net/community/, a community with 35 separate discussion forums, some of which get little traffic despite there being 600,000 registered users and 4+ million postings in the bboard table.

There is a bit about this in the Discussion chapter of Software Engineering for Internet Applications:

How many forums should a site have? Let's consider a site for music lovers. Would one forum be enough? Maybe not. Will the classical music lovers be interested in a discussion of Pat Boone's cover of AC-DC's "It's a Long Way to the Top (If You Wanna Rock 'N' Roll)"? So it will be a good idea to split the discussion into at least two forums: Classical and Pop. Let's say that a Pat Boone fan comes into the Pop forum one day and encounters a discussion of the lyrics from Ice Cube's Death Certificate or an MP3 from Prodigy's Fat of the Land? We'll clearly need to split up the Pop forum into Christian Pop, Techno, and Rap. We're expecting a lot of Beatles fans as well. Which of these forums would they gravitate toward? Maybe we need a '60s Rock forum. On the classical side there are a lot of grand opera nuts who won't want to be distracted by discussions about authentic instrument performances of Baroque music. Sophisticated modern music fans discussing John Cage's "Four Minutes, Thirty-three Seconds" won't want to waste time discussing the fossils of the 18th and 19th Centuries. And if we turn our attention to the many styles of Jazz ...

It would be easy to justify the creation of 100 separate forums on our music site. And indeed USENET contains more than 50 rec.music.* groups, including rec.music.beatles.moderated, for example. That turns out to be the tip of the iceberg, for the alternative hierarchy sports more than 700 alt.music.* groups , including alt.music.celine-dion and alt.music.j-s-bach. If USENET can support nearly 1000 discussion forums, surely a popular comprehensive music site ought to have at least 100.

When discussion is fragmented, it is hard for a community to get off the ground. If there are 50 users and 100 forums, how will those users find each other? The average visit will result in a user concluding that the community isn't active. Such a user is unlikely to return or refer a friend to the site. Even when a community is large enough to support numerous forums, presenting discussion in a fragmented manner leads to extra work for the user whose interests are diverse. Suppose that a music scholar comes to USENET looking to see if there has been any recent discussion of Bach's "Schubler Chorales" and their influence on later composers. That's as simple as visiting alt.music.j-s-bach. If that scholar wants to check up on recent postings concerning Celine Dion's "My Heart Will Go On", he or she will have to scan alt.music.celine-dion separately.

A good example of a thriving community with a single discussion forum is slashdot.org. It is very easy to find the topics being actively discussed on slashdot: look at the front page.

It is possible to take the "one forum" and "many forum" approaches on the same site at the same time. For example, look at http://www.photo.net/bboard/ (static copy at http://philip.greenspun.com/seia/images-discussion/photonet-bboard-original.htm ). There are separate Medium Format, Nature Photography, and Photo Critique forums. For a user to browse the new postings in these three forums will require seven mouse clicks: down into this page, down into Medium Format, back, down into Nature, back, down into Critique. With a different SQL query, however, postings from all these very same forums can be combined on one page, as in http://www.photo.net/bboard/unified/ (static copy at http://philip.greenspun.com/seia/images-discussion/photonet-bboard-unified.htm). Postings from particular forum topics may be distinguished with a special publisher-chosen color or icon. Suppose that the user finds the Photo Critique forum overwhelming and uninteresting. These postings can be excluded from his or her personalized unified view via clicking on the "Customize forums" link at the top (static copy at http://philip.greenspun.com/seia/images-discussion/unified-forum-personalization.htm) and unchecking those forums that are no longer of interest.

The unified discussion forum works reasonably well at photo.net, but having only one dimension along which to categorize postings has proved limiting. There is a Nikon forum and a Sports forum, for example. In which one should a reader post a question about the best Nikon-mount lens to use when photographing a volleyball game? A user of USENET circa 1980 would be able to cross-post into both Sports and Nikon but maybe we can do better.

What if we had multiple dimensions? What would they be for photography?

How about for aviation?

The Analogy

You look up at the night sky with its infinitude of stars (like the 3.5 million discussion forum posts at photo.net). What objects do you see? Those that are either very close (Earth's moon) or those that are very bright (a supernova in a galaxy far far away). This is how the Solar Magnitude Forum should work.

Example

Joe Pilot tells the system "I have a 1987 Piper Malibu based at KSMO (Santa Monica, California). I fly to Baja, California regularly, am instrument rated, and am thinking about adding a helicopter rating to my Private certificate."

An ideal forum system would show Joe Pilot almost anything related to the KSMO airport and the 1984-1988 Malibus with the Continental engine. Joe would also probably want to see threads relating to the later years of the Piper Malibu when they changed to a Lycoming engine and renamed the plane "Mirage". Joe would also be interested in hearing about good airport restaurants in Southern California and about any difficulties in getting across the Mexican border and back.

How about a thread about the best airport restaurants in Massachusetts? Definitely not for Joe. In our universe, this is far away and not a very bright object.

What about a thread about a Piper Malibu Mirage crash in Europe? There are only 1000 Malibus ever built, so this would be of vital interest to Joe as a Malibu owner hoping to avoid such a fate. The fact that the geography is thousands of miles away isn't a factor because the distance on the aircraft dimension is so small.

What about a thread on the airliner that crashed in Lexington, Kentucky after attempting a takeoff from the wrong runway? If the discussion is thoughtful, given how prominent airliner crashes are to the general public, this thread would probably merit Joe's attention. Far away but very bright.

What information do we need to store?

For each thread, we need to classify it along multiple dimensions. After that, we need editors or users to give a thread a brightness according to the productivity of the discussion, the educational value, the humor, etc.

How do we build it?

The relational database management system (RDBMS) is the standard source of persistence for most Web sites. An RDBMS will include a B-tree index facility, good for obtaining quick access to rows that can be sorted along any one dimension. The challenge here is that we have multiple dimensions and need to compute a distance from each user to each thread. It isn't possible to define 10,000 indices on a table, one for each participant in the discussion forum system.

One of the authors (Greenspun) once worked on a system for student-employer matching that would score every student resume in the system against an employer's goals and then return the entire table, sorted by descending score. This required a sequential scan of the resume table, but it performed acceptable because we were able to keep the entire database in RAM and there weren't that many students.

In this case, it seems as though it might be tougher to do the job as a straight SQL query. We only present complete threads and therefore we can cut down on the number of data items to be examined. In the case of photo.net, the 3.5 million entries are spread among 1 million threads. Suppose we have a table that records values along 20 dimensions. The dimensions on average shouldn't have more than 256 possible values, so that's one byte per dimension to record where a thread sits in our 20-dimensional space. The total size of such a database for photo.net would be only 1 million times 20 bytes or 20 MB, i.e., almost nothing. A substantial amount of processing might need to be done to compare the reader's position to a thread's position (for aircraft, for example, you'd need a big taxonomy where a Cessna 172 and Cessna 182 were closer than a Cessna 172 and a Cessna 310 twin-engine plane).

Next Steps

This would be a good Computer Science master's degree project. If you are interested in our help, please contact us via email.
philg@mit.edu

Reader's Comments

Have you used Experts-exchange.com lately? They seem to have figured this one out. The question data element itself should have a 1:many relationship with categories, similar to the cross post method you mentioned. The cutting edge right now is to use the question title to display context sensitive categories to choose from, and multiple categories can be chosen effortlessly.

I can see this being the next new thing to hit Photo.net. I used to frequent the forums a lot and I know this would not only help to make posting easier, but also aide in getting visibility from a much wider audience of qualified members who can answer the questions.

-- Steve Tout, October 1, 2007

Add a comment | Add a link