Saturday, December 29, 2007

Ruby On Rails new aptitude


web development with RoR

So, I joined the Ruby on Rails crowd. One year ago I got a complimentary book from O'Reilly, and only this week I got around to read it. I have wanted to explore Ruby on Rails for quite a while, and today I had my first taste.
I put the book on my desk, and I started following the examples on my Linux desktop.

A great architecture.


It started very well. Unlike many freehand architectures, Ruby On Rails forces you to do the right thing. Separating the programming logic from the interface is something that is recommended in all programming books, but Ruby on Rails imposes its goodness on you. If you want to take advantage of the quick framework, you need to follow the rules.
MVC

Ruby On Rails MVC architecture is not new. The geniality of this implementation is to make it easy.

Going uphill

After the theoretical part, it was show time. So I followed the tutorial. Well, I tried.
It looks like Ruby on Rails 2.0 has changed a few defaults. As a result, the vast majority of great tutorials in the web and on books are broken. No kidding.

Default database

The first obstacle was easy. By default, Rails now uses SQLite3 instead of MySQL.
The rationale given in the announcement is that SQLite3 is easier to use than MySQL. That may be true, provided that your application stays confined to your desktop, and perhaps to three or four users. When your application scales as to require more than one web server, you face a migration of a production environment. True, Ruby on Rails assists you in this task, but my personal feeling is that planning is better than patching.
This one was easy to spot. Many tutorials recommend editing the database configuration file, and I did it, only to find out that it was not what I expected. The online help in my Mac says that the default is still MySQL.
$ rails --version
Rails 2.0.2
$ rails --help
Usage: /usr/bin/rails /path/to/your/app [options]

Options:
-r, --ruby=path Path to the Ruby binary of your choice (otherwise scripts use env, dispatchers current path).
Default: /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby
-d, --database=name Preconfigure for selected database (options: mysql/oracle/postgresql/sqlite2/sqlite3).
Default: mysql
-f, --freeze Freeze Rails in vendor/rails from the gems generating the skeleton
Default: false
Anyway, the solution is to start with
$ rails -d mysql project_name
And we are back in track. The tutorial could resume.

scaffolds

But it turned out that the backend database was not the only significant change. This one kept me tied for about two hours.
Almost every tutorial illustrates the joy of publishing with Rails, by creating a simple table, and having its contents listed on the web, with a simple editing form to insert new contents and modify existing ones. It is a feature that attracts developers to Rails, and I was eager to enjoy the quick glory of instant publishing.
It didn't work. The tutorials all sing the same song:
  • generate a model
  • edit the model, adding the attributes
  • generate a controller
  • edit the controller, adding a scaffold line
  • run and marvel
It failed. It failed on Linux, and it failed on Mac OS X. Of course I though that I made some mistake. So I did everything afresh. It failed.
Then I thought that perhaps Rails had not been installed correctly, and thus I reinstalled Rails on both boxes, and started from scratch. It failed again.
I thought that maybe, given that the new default database was SQLite3, using it could make a difference, so I tried that one as well. It failed on Mac OSX just as before, and it did not started at all on Linux, because SQLite3 was not installed (it was just an "apt-get" away, but that it's beyond the point).
So I did what I should have done in the first place. I entered "ruby on rails scaffold" on a Google search textbox, and the first item in the result list gave me the answer. RoR 2.0 breaks compatibility with the past (and with the enthusiastic tutorials that are attracting its users).
The song now is :
  • generate a scaffold with attributes
  • generate a controller
  • run and marvel.
Which is admittedly much easier and painless. Too bad that the whole world was trying the old syntax, which was easy enough to appeal.
In the end I got my application running, and making it required just a few minutes, once I got the right moves.

Summing up

Well, I may have sounded a bit negative in my previous notes, but the truth is that, despite a few shortcomings, Ruby on Rail has a great future, and I am going to try the examples from the rest of the book.

Wednesday, December 19, 2007

Pop quiz (with prize): generate 4 billion records

My latest quiz was quite popular, and some interesting ideas were submitted.
There was an interesting development. A colleague called and asked me for advice on how to insert 4 billion rows in a table with a simple structure.
create table t1 (
id tinyint not null
);
Very simple. No primary key, no indexes. It is needed to perform some specific tests.
Actually, not 4 billion, but 2^32 records are needed, i.e. 4,294,967,296.
The classical method used in these cases is doubling the table contents:

insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1);
insert into t1 select * from t1;
insert into t1 select * from t1; # and so on
My solution was similar to the one from my quiz.
CREATE VIEW `v4` AS
select NULL
union all select NULL
union all select NULL
union all select NULL;

CREATE VIEW v1024 AS
select null from v4 a, v4 b, v4 c, v4 d, v4 e;

INSERT INTO t1 select 1 from v1024 a, v1024 b, v1024 c, v4 ;
This one is faster than the doubling method, but it still requires from 40 to 65 minutes, depending on how fast is your server.
So, the challenge, for which we will give away 1 MySQL T-shirt to the winner, is as following:
  • Generate 2^32 records for table t1;
  • no limits to the length of code to use;
  • you can use stored routines, temporary tables, views, events, whatever makes the insertion fast;
  • the method must be portable across operating systems. If it is not portable, a Unix-only method may be accepted if it is exceptionally faster than SQL-only solutions;
  • methods relying on external applications cannot be accepted;
  • If an programming language is needed, for compatibility with the test suite we can only accept Bash shell or Perl scripts;
  • If you devise a fast method to insert data using MySQL Proxy, a Lua script can be accepted.
  • what matters is the final insertion speed and ease of use.
  • If a method uses an external script, its speed must be more than 20% faster than the fastest method using only SQL.
  • The speed will be calculated on my server, using MySQL 5.1.23.

Solutions so far

  • Dpin. 20 minutes for a portable solution is very good.
  • Jedy. Very nice, but not that fast. 32 minutes.
  • Todd. Brilliant solution (10 minutes for 4 billion rows!), but really impractical. We need something that works for any engine. This one is a dirty trick that is fun to use once, but in the long run it won't stand.

Data from nothing - solution to pop quiz

My latest post on generating one million records received many comments, with interesting solutions.
The challenge required the insertion of 1 million records in a simple table, with a few constraints.
create table t1 (
dt datetime not null,
primary key (dt)
);

The solution

The official solution is straightforward:
create view v3 as select null union all select null union all select null;
create view v10 as select null from v3 a, v3 b union all select null;
create view v1000 as select null from v10 a, v10 b, v10 c;
set @n = 0;
insert into t1 select now()-interval @n:=@n+1 second from v1000 a,v1000 b;
You can appreciate the title here. Data from nothing. Generate 1 million valid records from a view made of NULLs!
The principle is easy.
  1. First create a few values in a view, by means of UNION ALL queries.
  2. Then, using Cartesian products, generate a larger view.
  3. And a larger one, containing 1,000 self generating rows.
  4. The final step is a simple Cartesian product in a self join.
This is something that they told you not to do in Database 101 classes, but it can be handy sometimes!

It can be reduced to 4 lines:
create view v3 as select 1 n union all select 1 union all select 1;
create view v as select 1 n from v3 a, v3 b union all select 1;
set @n = 0;
insert t1 select now()-interval @n:=@n+1 second from v a,v b,v c,v d,v e,v;

The other solutions

The solution that comes closer to the intended one comes from Shane Bester.
set @a:=1167602400,@b:=1168602400;
create view v as select 1 union select 2 union select 3 union select 4;
create view x as select 1 from v,v a,v b,v c,v d;
insert t1 select(from_unixtime(@a:=@a+1))from x,x a where @a<@b;
Shane also produced the fastest solution.
create view v as select a.dt from t1,t1 a,t1 b,t1 c,t1 d,t1 e,t1 f,t1 g;
replace t1 values('990101'),('980101'),('970101'),(@a:=1);
replace t1 select (adddate(a.dt,@a:=@a+1)) from v,v a limit 999996;

B.Steinbrink produced the shortest one.
  INSERT t1 VALUES(@a:=72620211),(101),(102),(103),(104),(105),(106),(107);
replace t1 select@a:=adddate(@a,1)from t1,t1 a,t1 b,t1 c,t1 d,t1 e,t1 f;

Roland Bouman produced the most wicked one. And a method that caught me by surprise.
# WARNING! DON'T RUN IF YOUR SERVER DOES NOT HAVE AT LEAST 4 GB RAM !!!
set max_allowed_packet := 1073741824;
set @s=concat('insert into t1 values(@d:=10000101+@i:=1)');
set @s=concat(@s,repeat(',(date_add(@d,interval @i:=@i+1 day))',999999));
prepare s from @s;
execute s;
In 5 lines, he is generating a 37 MB query containing 1 million records!
Not very efficient, but it works, if you have a huge amount of RAM. Otherwise, it can crash your server!
Kai Voigt came up with the same idea, with a faster execution.
set @a=concat(repeat(concat(@s:="select 1 ","union all "),99),@s),@x:=0;
set @c=concat(@s,"from (",@a,")a join(",@a,")b join(",@a,")c"),@i="inser";
set @d=concat(@i,"t into t1 select from_unixtime(@x:=@x+1) from(",@c,")c");
prepare s from @d;
execute s;
Jedy solution was the fastest until Shane's arrived.
insert into t1 values (now()-2),(now()-1),(now()),(now()+(@a:=1));
insert into t1 select adddate(now(),@a:=@a+1) from t1 a,t1 b,t1 c,t1 d,t1;
insert into t1 select adddate(now(),@a:=@a+1) from t1,t1 b limit 998972;
You can see all the other solutions as comments to the previous post.
This post comes earlier than promised, because of an unexpected development. A new quiz is coming. Stay tuned!

Monday, December 17, 2007

Pop quiz: generate 1 million records

This is a quiz that has a aha! solution. Not so trivial, though. It requires some thinking.

Given this table:
create table t1 (
dt datetime not null,
primary key (dt)
);
Task: insert exactly 1 million records in table t1, with the following constraints:
  • Use a maximum of 5 (five) SQL statements;
  • Use only the MySQL interactive command line client;
  • No shell commands;
  • No loading of scripts;
  • No inserts from existing tables in your system. You must assume that t1 is the only table in your entire database;
  • No MySQL Proxy;
  • No stored routines, triggers or events;
  • Each statement must be not longer than 75 characters;
  • UPDATE. No modification of table t1;
  • No LOAD DATA.
Prize: fame and fortune (i.e. your name quoted in these columns).
I will publish the solution at the end of the week.

To make sure that I am not cheating, here is the MD5 signature of the solution file that I will publish this week.
fc6d32faf19b5ac1064093a6d7969f7c  solution.txt
If you are paranoid and believe that I can create an arbitrary file and make its contents match with the above MD5 signature, you have until Friday to get the solution from that. :)

Update: To keep the challenge interesting, I won't publish the comments with the right solutions for a few days. If you have sent a comment, don't worry if it does not show up immediately. I will publish it soon before the final solution.

Solutions so far

  • Shane. Three lines. Less than 6 seconds. And close to the original solution! (Your third solution is almost the same as the intended one)
  • Jedy. Three lines! and less than 6 seconds to execute! on top.
  • Dipin. Your latest solution (3 lines) tops the list.
  • Kai Voigt. one solution with the crowd, and a wicked one like Roland's, but much faster!
  • Roland Bouman. This is the wickedest solution so far. Not the shortest, but it's the one that is totally different from the intended one (and the other solutions). It will crash most weak servers, though.
  • Carsten. In the same league as Kai and Roland for the wicked solution.
  • Dirk1231. The challenge requires not to use other tables. (Also your second solution does). The third attempt put you finally in the list! Your fourth solution is nice, but not enough to climb to the top.
  • Sergey Zhuravlev. Excellent! Can you do it without creating a table? (your second solution is very nice and imaginative)
  • WSchwach. Your solution qualifies as cheating. Altering the given table to insert duplicate records is not a valid solution. Good shot, nonetheless!
  • Morgan Tocker. Very creative! That's very good.
  • Matthias. Good use of the allotted characters.
  • Ephes. Not bad. But you are not assuming that t1 is the only table in your system. Will you try without creating tables?
  • Hubert. Nice try. One of the elements you mentioned will lead you to the right track. Keep trying.
  • Bill Karwin. I did not specify that the numbers should be contiguous, and you took advantage of it! Well done!
  • Domas. Good solution. I expected more from you.
  • Tobias. Your first solution runs forever. The second one is cheating (using a information_schema table)
  • Erik. Nope. No other tables allowed. Not even information_schema tables.
  • William: Nope. No stored routines allowed. And the instruction to create the routine is longer than 75 characters. Your second solution has a command longer than 75 chars, and it's using other tables, and it does not work either!

Some hints

Keep trying, and consider that my intended solution does the following:
  • Inserts contiguous records;
  • Uses all dates in this century;
  • Does not insert from ANY table, not even t1;
  • Does not use LIMIT;
  • The total execution time is below 7 seconds.
UPDATE: Solution online!

Sunday, December 02, 2007

MySQL University - Introducing Lua for Proxy scripting

MySQL University
Mark your calendars!
On Thursday, December 13th at 15:00 CET (14:00 UTC, 09:00 EDT, 06:00 PST) I will host a session of MySQL University, on the topic of Introducing Lua for MySQL Proxy scripting.

For those who missed the previous announcements, MySQL University is a series of online expert lessons that you can join for free and attend from the comfort of your home or office. The slides are provided in either PDF of wiki pages, the audio is an ogg stream, and you can interact with the lecturer via IRC.
If you have heard of MySQL Proxy but haven't got the time of getting involved with it yet, this session is for you. If you were interested but you thought that another scripting language would be too difficult, give this session a chance. Enroll now!

Vote on Planet MySQL