Friday, July 28, 2006

An ugly query should be a warning bell

A recent question in a newsgroup was

Given this table in MySQL 4.0, how can I run a single query to sort the records by the number of pipes in myfield?
id(int) -------- myfield(varchar)

1 --------- |number1|number2|
2 --------- |number1|
3 --------- |number1|number2|number3|
4 --------- |number1|number2|number3|number4|

Suppressing my urge to say something about the data structure, I suggested this clever solution from a test sample I deduced from the request:

select * from mytable;
+----+-------------------+
| id | myfield |
+----+-------------------+
| 1 | 10|1|12|9 |
| 2 | 1|2 |
| 3 | 19|5|59|91 |
| 4 | 111|31|10|1|12|9 |
| 5 | 1|2|7 |
| 6 | 95|59|91 |
| 7 | 123456789|2000000 |
+----+-------------------+

select id, myfield,
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(myfield,
'9',''),
'8',''),
'7',''),
'6',''),
'5',''),
'4',''),
'3',''),
'2',''),
'1',''),
'0','') as so
from mytable
order by so;
+----+-------------------+-------+
| id | myfield | so |
+----+-------------------+-------+
| 2 | 1|2 | | |
| 7 | 123456789|2000000 | | |
| 5 | 1|2|7 | || |
| 6 | 95|59|91 | || |
| 1 | 10|1|12|9 | ||| |
| 3 | 19|5|59|91 | ||| |
| 4 | 111|31|10|1|12|9 | ||||| |
+----+-------------------+-------+

However, the poster added hastily that his request was wrong, and that what he identified as "numbers" are actually character strings of various length, thus making my solution much harder to find. The sample given was


1 --------- |ruote in lega|macchine gialle|
2 --------- |case in affitto|
3 --------- |treni|vagoni|camini|

But then I overcame my desire of showing off with my cunning solutions, and told him that, apart for the initial misunderstanding about numbers and strings, his table was stained with the unforgivable sin of not being even in 1NF.
Therefore, the right course of action would be to split the table:

contents
1 ruote in lega
2 macchine gialle
3 case in affitto
4 treni
5 vagoni
6 camini

mytable
ID id_content
1 1
1 2
2 3
3 4
3 5
3 6

This way, the ugly query to sort by "how many pipes" would be reduced to the simple

SELECT id, count(*) as how_many
from mytable inner join contents using (id_content)
group by id
order by how_many

Beware of ugly queries. Often they tell you that something in your structure design may be wrong.

2 comments:

gleam said...

A much easier way to count the number of |s is to do something like this:

select length(col)-length(replace(col,'|','')) as numpipes from mytable

Obviously I agree wholeheartedly that normalization was in order. But in terms of solving the original problem without modifying structures or anything else, the above method is probably much faster.

Erik said...

There are no unforgivable sins - somtimes a database design that violates 1NF could be a good solution.

I am currently considering a situation that resembles the one in the example. But we have the normalized version and are considering a move to the non-normalized. This would heavily reduce our database load, since we need the list all the time, and the select statement retrieving it is one of the most frequently run in the system.

The cost is of course that finding data using one of the values in the list will be somewhat more complicated (LIKE '%|ruote in lega|%') and could not be supported by standard indexing.

The benefit is that finding the complete list for an id will be much faster and database volume smaller.

Vote on Planet MySQL