Reader's Comments

on Data Warehousing
There actually are some fairly good data warehouse tutorials on the net. But I'd rather read a good book (like philg's, for example), so go out and buy Ralph Kimball's. He's the one who really understood that a *big* business opportunity existed where Oracle and DB2 (not to mention older "legacy" databases) feared to tread -- producing sales reports. Actually his stuff is good and readable and full of useful examples and even a little mini-warehouse system he wrote in Microsoft Access (a feat worth a medal in itself).

I went to a technical presentation at the Oregon Graduate Institute about a year ago -- their public seminars used to be on things like the internals of Chorus (an early-90s French variant of Mach) or n-dimensional object-oriented whoozis or whatever. Now that it's the Era of the Net, of course, these talks have turned into thinly disguised marketing pitches.

And at this one, the pitch was for a local company developing a "data cube" front-end analysis tool for the aforementioned Moby Data Warehouses, like Kimball's Red Brick or the equivalent in Oracle or what-have-you. And the example the guy used about how great their tool is was that Walmart has used data warehouses to figure out that their sales of diapers and beer go up at 5 pm, so they put them in the display areas at the front of the store. Why? Because dad goes home and gets a call from mom on the cell phone to remind him about picking up the diapers, so of course he then picks up a six pack at the same time. And this takes $20 million and a cafeteria full of DBAs, sysadmins, financial specialists, data extracting/mining/tuning/cleaning/ignoring specialists and so forth to figure out.

I'm being a bit unkind about the corporate value of data warehouses. The presenter also, after vigorous questioning, admitted *another* result of data warehouses. It seems that corporate America has never been able to relate the selling price of a retail item to its production cost. I know that is a big shock -- it was to me -- but it's true. In other words, if you have an EEE-wide shoe size, you can find it in the big stores because they know people require all kinds of shoe sizes and stock accordingly, even though some sizes move slowly or not at all.

Now come the data warehouse reports to inform regional sales managers that the marginal cost of producing those EEEs is higher than the regular ones, in fact it cuts significantly into the overall margins for shoes. So . . . they stop carrying your size and you have to go elsewhere at a much increased direct and indirect cost in time, car fumes and annoyance, not to mention the much larger lifetime sales loss to Big Mall Box, Inc. because they lost you as a customer forever.

This reminds me of the other efficient market created by computers that we know about -- the airline reservation racket, where highly-advanced theorists and programmers have created the yield management system to maximize airline ticket profits. This allows me to sit next to you and pay $400 for a round trip that cost you $1200, even though we bought our seats less than 24 hours apart. It has worked so well that the airlines are now being impelled to force small travel agents -- the bread and butter providers of business travelers and the backbone of their cash flow -- out of business. You say you prefer to book your flights with your friendly local travel agent? Sorry, go with Mega Agency or book 'em online . . . if you dare.

The moral of the story: no doubt data warehouses and yield management systems have their place, but never forget what their place is, especially when the well-dressed sales-, er, "corporate executive" is pitching you the latest Pet Rock, er, data cube presentation technology.

-- Fred Heutte, November 2, 1997

Only a couple days after my last comment, I was paging through the October 27 issue of Information Week and came across a Tandem ad, running double-truck across pages 54-55:

[First, you have to imagine a large thirty-ish guy standing in a doorway, wearing a diaper. Ok, got that?]

"AT 6:32 PM EVERY WEDNESDAY, OWEN BLY BUYS DIAPERS AND BEER. DO NOT JUDGE OWEN. ACCOMMODATE HIM."

"If a data mining query discovers that between 6 and 8pm men buy diapers and beer, chances are you'll see more diapers and beer. It's with this kind of valuable -- and sometimes odd -- information that Tandem is helping people in retail, banking, telecommunications and insurance uncover business opportunities."

This also uncovers a recurring theme in modern data processing, one that changed the traditional acronym of GIGO from "garbage in/garbage out" to "garbage in/gospel out".

http://www.journalism.wisc.edu/jargon/jargon_21.html

-- Fred Heutte, November 6, 1997

Whenever I see someone trying to explain or justify data warehousing, almost inevitably the Parable of the Beer and Diapers is brought up. It has entered the apocrypha of the data warehousing community as the single compelling example of the utility of data warehousing. In every case, it is attributed to a different corporation, often to some nameless "large retail store." It has all the markings of an Urban Legend, like the one about the guy who wakes up in a hotel in a foreign city in a bathtub full of ice, a splitting pain in his side, and a note written on the mirror saying "Call 911," his kidneys having been removed by organ bootleggers. Does anyone know if the Beer and Diapers story actually happened, or if some enterprising Tom Vu of the data warehousing world made it up?

-- Jin Choi, February 2, 1999
Here is where the Diapers/Beer NOT LEGEND came from:

From: Ronny Kohavi ronnyk@bluemartini.com Date: Thu, 06 Jul 2000 22:16:36 -0700 Subject: Origin of "diapers and beer" For my invited talk at ICML in 1998, I tracked the beer and diapers example further. Check out slide 21 in http://robotics.stanford.edu/~ronnyk/chasm.pdf

Basically, I found the person in Blischok's group who ran the queries. K. Heath ran self joins in SQL (1990), trying to find two itemsets that have baby items, which are particularly profitable. She found this beer and diapers pattern in their data of 50 stores over a day period.

When I talked to her, she mentioned that she didn't think the pattern was significant, but it was interesting.

-- Ronny

-- Tom Mathews, June 23, 2001

Add a comment