How long is the average Internet discussion forum posting?

A friend of mine who works at a database management system company asked for thoughts on how long a string a database table needs to be able to store, as a practical matter, to serve most Internet programming needs.  This prompted me to do some queries into the photo.net discussion forum.  Here’s my message to him, which I thought would be interesting to nerd readers….


Three basic issues for Web development relating to varchar/clob:


1) Strings are uploaded from a browser-rendered TEXTAREA are of a length that is impossible for the programmer to predict. In a sense, then, every text slot in the database must be prepared to accept a string of arbitrary length.

http://www.cs.tut.fi/~jkorpela/forms/textarea.html#browserlimits reveals that some browsers have limits of 32K or 64K but that as Microsoft and Mozilla get more sophisticated these seem to be disappearing.

2) Software developers of Internet applications are often first-time SQL programmers and sometimes first-time programmers altogether. Unless a DBMS can make CLOBs work with every SQL function and command these novice programmers must learn a whole new computer language, essentially, to deal with CLOBs.


3) Internet applications are often developed using feeble ad hoc tools, such as PHP (my students in 6.171, MIT’s Software Engineering for Internet Applications, mostly picked PHP to do their semester project even though I would have discouraged this, being a mistruster of thrown-together unnecessary new languages). Many of these tools don’t have facilities for dealing with anything beyond the basic SQL data types so they couldn’t use CLOBs if they wanted to.


I think for Web development it is reasonable to expect the average string length from the user to be 300 chars, despite wanting to be prepared for a maximum of 32K or even larger. Oops. Typing that prompted me to do the query (see below). Averaging 2 million messages on photo.net, the correct number is 425. The histogram query reveals that out of 2 million messages over a 10-year period a 32K limit would have resulted in 6 messages being rejected and a 16K limit something like 30 rejections.


If you wanted to implement something like Salon.com as a single RDBMS table for both articles and comments on articles I think a 64K limit might be required. If someone authors a 5000-word magazine article in Microsoft Word and then saves as HTML that will be 25-30k of content plus at least a factor of 2 in HTML tags and other Microsoft-added filler.

http://www.photo.net/bboard/q-and-a-fetch-msg?msg_id=002oFh is an example of one of the big postings on photo.net. It is 37962 chars long, the HTML is very clean (i.e., much less filler than if saved by Word), and yet the page doesn’t seem excessively long.

So… my conclusion from looking at the queries below is that 32K would do the job for a pure discussion forum system and that it would be marginal for storing articles unless a publisher decided that everything should be broken up into “part 1”, “part 2”, and “part 3”. Looking at the .html files on photo.net, the vast majority are in fact under 32K and most are under 64K.


However, if you look at

http://philip.greenspun.com/seia/ the largest chapters are 88.7k.

If you want to facilitate novice programmers building full-scale content management systems where all the content is uploaded from browsers it might be necessary to make the varchar datatype bloat up to 100k-ish. But 32K would be adequate to build something like eBay, amazon (user-uploaded content and much of the publisher content as well), or photo.net discussion forums.


———– some stats from photo.net


select avg(dbms_lob.getlength(MESSAGE)),count(*) from bboard;


AVG(DBMS_LOB.GETLENGTH(MESSAGE))   COUNT(*)
——————————– ———-
        424.672669    2052290


select round(dbms_lob.getlength(MESSAGE),-3), count(*)
from bboard
group by round(dbms_lob.getlength(MESSAGE),-3);



ROUND(DBMS_LOB.GETLENGTH(MESSAGE),-3) COUNT(*)
————————————- ———-
        0  1452035
     1000   510236
     2000    58330
     3000    11972
     4000     3399
     5000     1303
     6000      481
     7000      264
     8000      138
     9000       93
    10000       66


    11000       38
    12000       22
    13000       21
    14000       17
    15000        6
    16000       12
    17000        9
    18000       14
    19000        4
    20000        4
    21000        1

    22000        2
    23000        1
    24000        4
    25000        3
    26000        2
    27000        1
    30000        2
    32000        1
    33000        2
    34000        1
    37000        1

    38000        1

Note:  Oracle did a much better job formatting these in SQL*Plus; for some reason the tabs didn’t carry through after cutting and pasting.


——————————


Epilogue (not from my email to the friend):


Look how much fun it is to program SQL.  Three lines of code and you get an interesting answer (and those three lines would have been much cleaner and simpler if we hadn’t been forced to use the CLOB datatype, which has its own strange accessor functions).  Compare to Java and C where typing until your fingers fall off usually doesn’t result in much of anything.  SQL, Lisp, and Haskell are the only programming languages that I’ve seen where one spends more time thinking than typing.

Full post, including comments

Mixter, a 6.171 project, launches out of Creative Commons

CC Mixter, at http://ccmixter.org/, has been launched by Creative Commons.  This is a service for musicians to make their work available for sampling, remixes, mash-ups, and other purposes that a middle-aged Boston Symphony Orchestra subscriber wouldn’t understand.  What I do understand, however, is that this system was built as a project in MIT class 6.171 (Software Engineering for Internet Applications) by Ian Spivey and Matt Drake.  It is exciting to see the service apparently live and well.


[Many postings today due to 30-knot wind gusts blowing away my helicopter lessons.]

Full post, including comments

Maybe women wouldn’t want to get married if they knew how time-consuming it was

A 40ish friend he told me about life with his twentysomething girlfriend:



  • “I plan the dinner, shop for all the ingredients, choose and buy the wine, cook and clean up.”
  • “We were staying at a friend’s house.  When it came time to leave she was relaxing while I cleaned up and put the sheets and towels in the washing machine.”
  • “We earn about the same amount of money and yet I pay for everything.”
  • “She seems to think equality means doing less than half the work so she won’t ever have to feel mid-twentieth century housewifelike.”

At the same time we know a huge number of women who seem to be good at everything except holding onto a guy long enough to get married, something that they claim to want.  Could it be the case that in the old days mothers sat their daughters down and explained to them how selfish and spoiled most men are and what they needed to do to keep the guy happy?  Whereas now young women are exposed to wisdom from Jada Pinkett Smith, a popular actress:



“Women, you can have it all—a loving man, devoted husband, loving children, a fabulous career,” she said. “They say you gotta choose. Nah, nah, nah. We are a new generation of women. We got to set a new standard of rules around here. You can do whatever it is you want. All you have to do is want it.” (speaking at Harvard, a talk that got her into hot water for being too “heteronormative”)


The implication of Pinkett Smith’s remarks was that a Harvard girl, in virtue of being bright, well-educated, and ambitious, was entitled to these things without doing too much work except maybe on the career part.  She never added “if you’re willing to do the laundry, plan and pay for half the evenings out, straighten up the house in between visits from the cleaners.”


Some of our women friends do seem to have figured out what compromises and efforts are entailed but they suffer through many inexplicable (to them) dumpings and are into their late 30s by the time insight is acquired.  This wouldn’t be a problem except that by this age they are past their best reproductive years and are often rather embittered toward men.


A potential solution:  Find couples where the man is satisfied and not thinking about walking out.  Do time-and-motion studies of the female partner in these couples and figure out how much effort they are putting forth.  Report the results so that single women can decide if it is worth the bother.  Perhaps when they see the data they will come to agree with Isabel Archer in Portrait of a Lady, who, when all around her were trying to marry her off, thought



“she held that a woman ought to be able to make up her life in singleness, and that it was perfectly possible to be happy without the society of a more or less coarse-minded person of another sex.”


[Update:  Some (married) friends pointed out that there are quite a few books targeted at women who want to get married, offering advice.  However this advice is anecdotal and not based on hard numbers gleaned from surveys.  A friend in her 40s pointed out that perhaps by coddling our kids we’ve produced an entire generation too selfish to make a marriage succeed.  This afflicts both young men and women equally with the difference that men have the biological luxury to wait until they are 40 or 50 to figure it out.]

Full post, including comments

How girls learn about opportunities in math, science, and engineering

A 17-year-old polo champion is visiting us from Argentina and today was my day to give her the grand tour of Boston.  Naturally the MIT campus was on our agenda.  MIT’s new president, Susan Hockfield, rather than doing something interesting like starting a medical school, has made her first public action beating up on Larry Summers for his musings on why there aren’t an equal number of women and men in super nerdy academic jobs.  Hockfield says that “The question we must ask as a society is not ‘can women excel in math, science and engineering?’ but ‘how can we encourage more women with exceptional abilities to pursue careers in these fields?’”  I felt proud to be doing my share.  I had brought a 17-year-old girl who can do anything she wants to with her life onto the MIT campus to be inspired.  What happened?  Just downstairs from Hockfield’s office we ran into a woman who recently completed a Ph.D. in Aero/Astro, probably the most rigorous engineering department at MIT.  What did the woman engineer say to the 17-year-old?  “I’m not sure if I’ll be able to get any job at all.  There are only about 10 universities that hire people in my area and the last one to have a job opening had more than 800 applicants.”


[Spending the day with a young person is fraught with potential for humiliation.  She looked at my collection of 2000 LP records and asked “What are those?”  When I explained that they were records, she asked “What are records?”  It is too bad that the Supreme Court won’t let us execute 17-year-olds anymore…]

Full post, including comments

Merchant of Venice, the movie

Just back from seeing Merchant of Venice on the silver screen.  It is amazing how badly behaved nearly all of the characters are.  Shylock, mostly referred to as “the Jew” and addressed as “Jew,…”, is bitter and unwilling to forgive all the times the Christians have spit on him.  Shylock’s daughter is ungrateful for all of his loving care and trust and happy to run off and never see the old man again for the rest of her life.  The young Christian gentleman is typified by Bassanio, who squandered his fortune on high living and who decides to find a rich chick to marry so that he can pay his debts.  The rich chick Portia impersonates a judge so that she can help the rest of the Christians cheat Shylock out of the 3000 ducats he lent for the fortune-hunting expedition plus the rest of his wealth.  The only person in the entire play who behaves creditably is Antonio, the actual Merchant of Venice in the title.


It is tough to argue with a cast that includes Al Pacino and Jeremy Irons.  Teenage boys will also want to feign an interest in classic theater in order to get into this film, which covers a period in history where all women displayed either beautiful cleavage or entirely bare breasts.


[Those who complain that Shakespeare painted the Jews in a negative light should be reminded that Shakespeare almost certainly never met a Jew.  The Jews were expelled from England in 1290, with their property confiscated by the king.  Shakespeare finished Merchant of Venice in 1597.  Jews were re-admitted to England in 1655.]

Full post, including comments

James Dean died in a Porsche and boosted sales; what about JFK, Jr. and Piper?

At the Ralph Lauren car exhibit at the Boston Museum of Fine Arts, which opens to non-members on March 6, a plaque next to a 1955 Porsche 550 Spyder contains the following:



“In September 1955 legendary actor James Dean … crashed his new 550 Spyder and was killed.  This tragic event immortalized the Porsche name and transformed a relatively small company into a very big business.”


So… if it worked for Porsche with James Dean, how come it didn’t work for Piper when JFK, Jr. crashed his Saratoga?  If anything you’d expect the truck-like family man’s 6-seater Saratoga to have fared better than the rear-engined Porsche, which was notorious for hard-to-handle oversteer.


[Don’t rush down to the MFA to see this exhibit.  There are much more interesting car collections at a lot of the U.S.’s car museums, including the one 30 miles west in Stow, Massachusetts at the Collings Foundation.]

Full post, including comments

Suggestions for a trip to Portugal?

I’m considering a trip to Portugal on Friday March 25.  A friend is coming with me and she has to return on Sunday April 3.  I have more flexibility and could stay on.  Some questions for Portugal veterans…



  • Is the end of March a nice time of year to be in Portugal?

  • Can one stay the whole eight days in one hotel in Lisbon and make day trips or would it be better to stay in several different places (and, if so, what are one or two favorite places)?  I don’t want to spend too much time in transit.

  • If we are going to be moving around, is it best to rent a car?

Thanks for the help!

Full post, including comments

Question for pilots: What options to order on a Cirrus SR-20?

I ordered a Cirrus SR-20 yesterday, to be shared with a friend.  I’m still looking for the ideal Malibu to purchase but this gives us something fun to fly around New England, is very cheap to operate, and I may want to use it to do flight instruction.  I’m currently working on my CFI/CFII ratings and think it would be fun to teach instrument flying on 14-day cross-country trips with guys who want to buy a Cirrus but lack the instrument rating or the time in type that will comfort insurers (the Cirrus has a terrible fatal accident record, which is ironic because it has been marketed from the start as an especially safe airplane with its emergency parachute, etc.).  So the question becomes how to equip this airplane.  It will probably be resold after 3 years so that I can always be teaching in a plane that has comparable avionics to the new ones.  Therefore we don’t want to go overboard on cramming this simple airframe with Boeing 757-grade avionics that won’t earn their value back on a resale.


We were thinking of the following options:



  • leather seats (the dog needs his comfort)
  • 3-blade prop (smaller diameter ergo lower tip speeds ergo lower noise for the dog, who doesn’t wear headsets)
  • MFD upgrade to 5000C so that we can get the weatherlink
  • weather datalink
  • Emax engine monitor
  • 3rd year extended warranty including avionics

This leaves us with a plane that is $260,000.  We decided against the Stormscope because we don’t intend to fly anywhere near thunderstorms and the NEXRAD datalink should be good enough.  We decided against the Skywatch system because it is $21,500 and we think that in the long run we can swap the transponder for a Mode-S unit for maybe $2000 (Cirrus doesn’t currently offer this option) and get the TIS feed from the FAA RADAR.  We decided against the $11,500 ground prox warning system because we think that the Garmin 430 will give this to us by mid-2005 with a cheap upgrade.


The open question is whether to spend $19,000 extra for the double Garmin 430s and the fancier 55X autopilot and flight director.  The stock SR20 comes with a backup Garmin 250XL GPS that is VFR-only and has no VOR or ILS receiver and only a 5-watt radio transmitter.  Its autopilot does not have altitude preselect and can’t fly an ILS approach.  With the upgrade you get two identical GPS/VOR-ILS/COM units and don’t have to learn a different user interface.  If you do get stuck by yourself in ugly weather you can have the autopilot fly an approach while supervising and adjusting power.  And the flight director is awfully nice for when something goes wrong with the autopilot’s servos but you’re still in the clouds.


Thoughts from more experienced pilots?


[Update:  Thanks for the advice from all commenters.  We decided to go for the dual-430s, the fancy autopilot, and flight director.  Cirrus tells us that the plane will be delivered in mid-May, i.e., about three months after we placed our deposit.]

Full post, including comments

What car for a young man starting a career in Los Angeles?

A young friend is starting a career in Los Angeles.  His goals are the following:



  • fun to drive
  • favorably impress superiors at work
  • appeal to single women
  • spend less than $60,000 and ideally much less

What should he buy?


[Update:  I hadn’t wanted to prejudice anyone so I withheld my suggestion… the new Ford Mustang convertible.  He will be in California, where driving on the backroads in a convertible can be a lot of fun.  The fanciest V6 convertible lists for only $25,000.  There is a small back seat for a Golden Retriever.  The solid rear axle isn’t an issue considering that California has no potholes.  He doesn’t need the big V8 engine because California traffic is generally so heavy that he’ll be lucky to hit 30 mph.  To me this is a fun car that doesn’t say “I’m trying to impress you with my wealth” (which is always a failure in LA because there is always someone richer in the next lane with a $100,000+ car).  What do folks think of the Mustang idea?  Am I totally out of touch with youth?]

Full post, including comments

Alcoholism = stepping stone to success (plus why we must all move to Switzerland)

This article on the founder of IKEA reveals some interesting tidbits…



  • one can move to Switzerland and negotiate a fixed income tax rate related to the value of one’s house

  • Ingvar Kamprad, the founder of IKEA, is richer than Bill Gates now, partly because of the slide of the dollar and partly because Billg has been giving money away; the 77-year-old guy is worth roughly $50 billion

  • like our local hero George W., Mr. Kamprad has had trouble with alcoholism (perhaps we need to encourage young people to drink more?)

Alcohol is supposed to be so bad for brain cells, productivity, etc.  How can we explain the fact that so many hypersuccessful people are or were alcoholics?

Full post, including comments