Thursday, March 12, 2009

Normalization and smoking




An interesting question came yesterday at the end of the MySQL workshop in Athens. An user has a server with about 40,000 tables, and the performance is not brilliant (oh really?). The reason is that there is an application that requires one or more new tables for any user, and the tables are of about ten different structures.
The user asked the wrong question: "will the performance improve if I change the storage engine from InnoDB to MyISAM?"

Salle gave the first answer. Converting the tables to MyISAM would only exacerbate the problem. Instead of having one file per table, the user would have 3 per table. Any operating system would choke with such a number of files concentrated in a handful of directories.
My answer focused on normalization. I explained that, if 10 different table structures are used, then the user should create only 10 tables, add a column for user ID, and export the 40,000 tables into the 10 tables. When I explained that this reorganization would improve performance a lot, but it would require changing the application, the user tried to protest, asking for some silver bullet solution. Then I delivered the punch line: "Your business is growing, and as it grows, you are adding new table for every new user. If you continue like this, there will be so many tables that your operating system will collapse. Your database is doomed. It's like if your doctor tells you that if you don't quit smoking, you will die in six months. The same for your database. If you don't quit creating tables, your database will die unpleasantly. Quit smoking, now!"
That made quite an impression on the audience!

5 comments:

Mayavi said...

true .... and nice analogy ..

what do people think when they come up with these kinda granularity of data with separate table for each user ?

Did he mention what was the size of his each table ? and what was the reason behind creating separate tables for each user ?

Ranjeet Walunj
mayavi.info

Ronald Bradford said...

Alas, there are a number of projects that undertake this process. WordPress is one of note.

There are some performance tips you can use when your schema has a large number of tables.

Setting a large table_cache (monitor with Open_tables and Opened_tables), setting a large open_files_limit, realizing the OS may complain and then you have to look at ulimit for the mysql user.

Ensuring your tables are in different schemas will address the first issues of OS inode issues with simply too many files in a directly, however with too many schemas you may see issues simply at the OS level.

The advice in this post is an ideal approach however in this industry, we have to accept that optimizing design is not always possible and then we look for any improvements we can.



Check out our "MySQL Essentials" training course at 42SQL Education.

Giuseppe Maxia said...

@Mayavi,
Size of table is not important. The reason for creating separate tables is just a common beginner mistake.

@Ronald,
True, you can improve performance by fiddling with file limits, but 40,000 tables, which could become 80,000 in one more year, is not a sane proposition.

Giuseppe

Day after day we burn it all away said...

That was my impression too. The data was distributed in a set of 10 tables per user. Instead of using a "userid" and rows in 10 unique tables for all the users. So imagine 4000 users x 10 tables each, with the same data structure instead of 10 tables to handle them all.

@gmax + Salle, I have to admit that you handled the question very professionally and without any hint of making fun of the user.
Ok many people involved in db development, have an idea of how to structure a db; others just had a bad start.

Cheers

Unknown said...

We now have about 56 million tables on WordPress.com, adding 80k daily :)