Saturday, December 31, 2005

New Year quiz

A few days ago I was asked to analyze some data related to a particular service usage in a large communications provider. The (simplified) table looks like this:
CREATE TABLE `xusage` (
`item_id` int(11) NOT NULL,
`begin_date` date NOT NULL,
`xusage` int(11) default NULL,
PRIMARY KEY (`item_id`,`begin_date`)
) ;
And here is some sample data:
select * from xusage order by begin_date,item_id;
+---------+------------+--------+
| item_id | begin_date | xusage |
+---------+------------+--------+
| 1 | 2005-09-01 | 93 |
| 1 | 2005-09-02 | 90 |
| 2 | 2005-09-02 | 54 |
| 1 | 2005-09-03 | 66 |
| 2 | 2005-09-03 | 74 |
| 3 | 2005-09-03 | 90 |
| 1 | 2005-09-04 | 38 |
| 2 | 2005-09-04 | 49 |
| 3 | 2005-09-04 | 91 |
| 4 | 2005-09-04 | 91 |
| 2 | 2005-09-05 | 47 |
| 3 | 2005-09-05 | 49 |
| 4 | 2005-09-05 | 33 |
| 5 | 2005-09-05 | 83 |
| 3 | 2005-09-06 | 23 |
| 4 | 2005-09-06 | 99 |
| 5 | 2005-09-06 | 93 |
| 4 | 2005-09-07 | 75 |
| 5 | 2005-09-07 | 47 |
| 5 | 2005-09-08 | 3 |
+---------+------------+--------+
The task at hand was to produce another table (for charting purposes), having the same columns as xusage, with some additional pieces of information: the total usage for that day and the following day, the usage for item_id on day +1, the percentage of such usage from the total of all usage in day + 1;
For example, in the given data, the total usage on September 1st was 93, and on September 2nd was 144. The wanted columns for the first two days would be
+---------+------------+--------+-----------+-------------+-------------+------------------+
| item_id | begin_date | xusage | total day | total day+1 | usage day+1 | percentage day+1 |
+---------+------------+--------+-----------+-------------+-------------+------------------+
| 1 | 2005-09-01 | 93 | 93 | 93 | 90 | 62.5000 |
| 1 | 2005-09-02 | 90 | 144 | 144 | 66 | 28.6957 |
| 2 | 2005-09-02 | 54 | 144 | 144 | 74 | 32.1739 |
+---------+------------+--------+-----------+-------------+-------------+------------------+
The original table had 50 million records. Therefore I had to find an efficient way of fulfilling the task.
It can be done with a single query, although performance would suffer. I chose to create a supporting table to ease the task.

My solution is here.
How would you have solved it? (BTW, there is no prize, unless Arjen wants to provide some.
Happy New Year!

2 comments:

Kim Seong said...

Is it better to have the column order of the primary key on (begin_date,item_id) instead.

gmax said...

Kim,
The primary is already set in the original table.
Rebuilding a 50-million records table
to change the primary key is hardly what I'd call an efficient approach. :)

Vote on Planet MySQL