All user-contributed Web content needs pre-moderation

In the mid-1990s when I started building online communities I didn’t understand why publishers like Amazon pre-moderated all user-contributed content such as comments.  The vast majority of users were intelligent and well-meaning and only a small fraction of material had to be deleted.  It seemed like it wasn’t worth interrupting the flow of conversation and exchange to ensure that an off-topic posting never saw the light of day.  It would be intercepted within a day or so and deleted in any case.


The Manila software that Harvard runs behind these blogs shows the foolishness of my point of view.  More than 90 percent of the comments posted to this blog are link spammers trying to increase their Google rank by adding comments to old and forgotten postings.  Manila makes it impossible to delete this spams except one by one, each one requiring a several page process of confirmation.  In the old ArsDigita Community System we had a “delete all from this user” and “delete all from this IP address” option that made it a lot easier.  But in the Age of Spam what we really need is pre-moderation.  Maybe there should be an option for a vibrant interactive discussion that content goes live for 24 hours without being approved but otherwise given the small percentage of useful non-spam content it seems that the only answer is that nothing goes public without approval.


Another reason to program in pre-approval only is that eventually the moderators of every online forum find other things to do with their lives.  The server doesn’t realize this and soldiers on processing postings.  Spammers discover a happy home and the database fills up with crud.  Software should be robust to the moderator disappearing and in an Internet that is mostly spam that means approval-required-before-going-live.

25 thoughts on “All user-contributed Web content needs pre-moderation

  1. Both of the blogging tools that I have used (MovableType and WordPress) have features that make the kind of filtering you describe possible. I’m sure many others do as well.

  2. I’ve hacked WordPress to require additional, trivial input (“Are you a spammer?”, with the default response set to yes) upon comment submission. Without screwing around with blacklists, moderation, registration, etc., that stopped all comment spam. The user response is saved to a cookie, so they only have to set it the first time they post a comment.

    The thing to remember about comment spammers is that they rely on automation. They aren’t generally visiting sites and posting spam manually. If you can trip them up (and as long as you can roll your own solution to some degree, so you aren’t relying on a patch that might become so successful the bad guys account for it), you probably won’t see much spam at all.

  3. Movable Type did add comment screening in version 3, but only after they learned from version 2, which only showed you five comments at a time for deletion, a big pain when you get close to 100 spam comments per day. I ended up writing a special script called bulkdelete to handle this. I still don’t understand why Six Apart which sells Movable Type has not created a Bayesian comment deleter that learns to identify spam, the sort of technology that now screens most email.

  4. I’ve run blogs with both – both are good. WP is open source though – gives you much more flexibility to hack away if you want to change something. MT does have a plug-in architecture though and there are many plug ins available. WP is a little more difficult to admin IMO. You have to get your hands dirty in PHP even for fairly simple cosmetic changes. WP templates are more HTML centric. If I was starting a new blog I’d go with WP due to it being open source. My main blog with 2000 posts is on MT and I have no plans to migrate off the platform. It works fine.

  5. I have found WordPress to be far easier to install, maintain, and hackup than MT. I believe the anti-comment-spam tools/plugins available are better as well.

    However, it should be noted that I’ve not given MT v3 any more than a casual glance. V2 was the last full release I tried using and it frustrated me to no end.

  6. While we are on the topic, could you migrate photo.net’s forum software to something more modern? Pretty please?

  7. On my http://sandiegoblog.com/ I run WordPress with the Spam Karma plugin, and it does an *excellent* job filtering out spam.

    Before I added the Spam Karma plugin, I had a cron job that disabled comments for all posts older than a month, which cut down on the vector Philip describes, of old dead posts that are no good for anything but attracting spammers.

  8. One surefire way to cut 95-99% of spam is to require a user account/user registration. I’ve had to lock down my blog (azplace.net) because of all the online poker/viagara spam – formulating and automating HTTP POST requests is a relatively simple deal… …some sites have solved this riddle of wanting to enable anonymous posts by adding a CAPTCHA image field like google/blogspot prompts for when somebody creates a blog.

    Prior to changing comment post policy, I had a little script than ran every 5 minutes to clean off spam comments, but that was an inferior solution as was a “blacklist of spammer IP/phrases”, as new ones were always added. And I think Nucleus CMS > WordPress > MT.

  9. Hello!

    I host a web page (feisar.de) where everybody can freely post comments. It took 2 years until the spammers found it, but then I received about 300 spam postings a week. I was forced to add a simple spam filter, that would check for words like party poker etc. After 4000+ filtered spam postings, they finally gave up! 🙂

  10. I had a similar idea to that one posted by Dave above. If every blog required something special from the posters, like clicking an added button or starting the comment with some specific line (like “I am not a spammer”), it would be too costly to write spam bots. And looking on it from the other perspective it seems fair that you require some attention from the commenters if they want others to pay attention to their comments. This schema should work for email too: http://zby.aster.net.pl/kwiki/index.cgi?SafePublishingEmailAddresses

  11. For those of us who have already invested time into the Harvard Blogging project, does anyone have a comment spam solution for Manila?

  12. I just added a CAPTCHA test to my blog for anonymous posters, so we’ll see how effective it is. I suppose there are already programs floating around the net that can decode the image. Also, relying totally on somebody to transcribe from an image denies those with vision disability to use. However, the system I have in effect now doesn’t require a “logged in” user to pass the test…

    Adding special buttons or additional text can be easily spoofed by an automated script bot. And tracking by IP is a mistake too, as it seems these spam bots infest and infect machines all over the internets.

  13. I modified drupal to also ask a trivial question. It’s highly effective against comment spam. I submitted a patch to them that generalizes it. As long as each blogger thinks up their own question so there is no pattern, and is ready to change the question from time to time, this should work for some time to come.

    Spammers could try to build databases of the questions and get in as much spam as possible before the question changes, but I think that’s not super effective and it’s a long way off. You should never do more against spam than you have to (ie. captchas are overkill), since every anti-spam trick has collateral damage.

    Of course on very popular blogs it’s worth manual spamming. Only moderation can stop that, and one hopes not much of this is needed. The use of rel=nofollow should convince spammers that spamming for search engine rank is fruitless before too long.

  14. Philip, I’m the lead dev of WordPress, so you know what my answer will be, but if you’re interested in trying it out I’d be happy to set up an account for you to get a feel for the program.

  15. Philip, I am a little surprised that Manila does not have this as a solution… it should be relatively trivial to program ; but if it isn’t, maybe you could cough up the $15 a month it takes to get a real host? You could use frames to hide the fact that you were no longer using Manila, and keep your current URL.

  16. I dislike CAPTCHA and am much less likely to leave a message somewhere if I have to use it. Maybe I’m unusually lazy, but it seems like too much of the antispam effort is shifted my way in a CAPTCHA weblog setup, especially when less invasive means are available.

    For example, here’s the URL of a post with the ask-a-simple-question antispam measure I wrote about above:

    http://www.jesush.com/index.php?p=752

    Note that you have to answer “No” to the question, or the post isn’t accepted. Answering “Yes” saves that answer to your cookie in the same way email and name are saved, so you’ll never have to set that manually again from your current computer (unless I have to change the question!)

    Finally, I’d love to see you switch to WordPress for this blog–or some package that will tell commenters which markup they can use and make required fields obvious in the comment window.

  17. Philip, I can’t offer you a comparison of MT and WP but I can say I am not particularly impressed with MT3’s anti spam technology after setting up some sites for my girlfriend (who gets quite a bit of comment spam). The best path available is to hand over user registration to Six Apart and require users to obtain a “TypeKey” account on the Six Apart servers. If you don’t want to do this (I don’t) you have to go into “comment moderation” mode, in which every single post must be approved. I opted for comment notification by email and now my inbox glogs up with notifications, almost all of which are spam. And that’s just for comments on my own posts, which are maybe 1/30th of the total site content.

    I’m amazed that Six Apart has not come up with a better solution given that this is the one hard problem that needs solving in the world of weblogging, and they are trying to sell software against open source alternatives.

    That said I always recommend TypePad, the hosting service run by Six Apart using Movable Type, to people looking for an easy weblog system. The Pro version is pretty cheap and in the FAQ they say they monitor for comment spam and delete much of it on your behalf, which I guess they can do because they can see what is being posted to a whole conglomeration of weblogs. You can also have your own domain name and such.

  18. Manila seems to represent most of the things that are wrong with the modern software industry.

    A blog application, especially one as simple as Manila, could be built by a twelve year old with any scripting language and any SQL like database server. But instead of building on 30 years of RDBMS engineering and a decade of web server engineering, they have wasted their energy on building their own object database and web server.

    Perhaps Manila would have been impressive in the early days of blogging but we are now in 2005, a full 5 years since blogging took off.

    ———————————————
    As a fun aside, I just submitted this comment and Manila managed to lose it. Awesome. Luckily IE stored the post data.
    ———————————————

    ———————————————
    Fun aside #2, I just tried to re-post (this time I forgot to enter my email address). It lost my comment again and displayed message telling me to enter my email address (despite no indication that Email is a required field or any indication that this is even used anywhere).
    ———————————————

  19. Fun aside #3, Manila added some crazy HTML linebreaks to my post (all I entered was a series of hyphens which it retained).

    Surely Harvard will see the light and move to MovableType or WordPress for this server?

  20. Thank you for being generous with your resources… I hope that you will receive more than you need for your time and
    energy. Keep at work!

  21. I can share my opinion with wordpress too. I have installed it in my site and put also akismet spam filter. I had to say that for a month I have over 500 spam messages blocked. That’s very good plugin.

Comments are closed.