User Activity Analysis
Assume the following data model:
create table users (
user_id integer primary key,
registration_date not null date default sysdate,
state char(2) references us_postal_abbrevs,
country char(2) references iso_country_codes,
...
);
-- we support threaded discussions on this server
-- (i.e., one can respond to a response), with a
-- content_type of "forum_posting"
create table content_raw (
content_id integer primary key,
content_type varchar(100) not null,
refers_to references content_raw,
creation_user not null references users,
creation_date not null date,
...
);
create table content_versions (
version_id integer primary key,
content_id not null references content_raw,
...
editorial_status varchar(30)
check (editorial_status in ('submitted','rejected','approved','expired')),
...
);
create table content_access (
user_id not null references users,
content_id not null references content_raw,
access_time not null date,
-- we're only going to store one row even if a user
-- reads the item several times
primary key(user_id, content_id)
);
-- helper tables
-- this table contains a row if a date is a holiday
-- the Oracle date datatype is precise to the second
-- so we use midnight at the beginning of the holiday
create table holidays (
holiday_date not null date,
);
-- a bunch more helper tables with similar information
...
Build SQL queries to answer the following queries:
- I'd like to see a 24-hour hourly histogram of posting activity in
the discussion forum. -- Creative Commons
- compare average number of postings by new users (registered within
past month) versus older users -- Cambridge Ent. Club
- compare total activity (average number of postings) on holidays
versus non-holidays -- Dspace
- does the average New Yorker post more than the average user from a
southern state? -- greenspun.com
- compare the posting activity, total and average, from users in
European countries versus users in Asia -- Condo
- are users who've read the Site FAQ (
content_id = 177)
more or less likely to post an item that required rejection by a
moderator? -- HP
- compare the number of users participating in flame wars versus those
who abstain (a flame war is defined as any discussion forum thread that
extends beyond 25 postings) -- Technique
- at 6:00 pm on a typical day, how many simultaneous discussion
threads are alive? -- Ziff-Davis