Tuesday, June 29, 2010

OpenSQLCamp EU 2010 - Last days to submit a talk

There is still time and hope if you want to present at the OpenSQLCamp 2010, European edition. The Call for Participation is open until July 11th.
Thanks for the ones who have submitted proposals.
A few caveats: if your proposal does not include a description, it will not be accepted. Therefore, wannabe speakers, please check your proposals, and make them as good as you can!
We want talk about all open source database. Not only MySQL.
Open source database fans, wake up, and submit a proposal!

Monday, June 28, 2010

MySQL Sandbox embraces Python and meets Cluster

If you have tried Quick start guides: MySQL cluster in 10 minutes, you may have realized that it is really quick and easy.
However, it leaves some typing to be done.
Users of MySQL Sandbox have a horror of repetitive typing, and this got me thinking. "Could I integrate MySQL Sandbox and Cluster?"
The answer was: "Sure."
But then I started thinking of all the minor and major changes that I wanted to do to the Sandbox and have delayed for too long. What I need, is a radical refactoring.
And then I remembered that it has been almost two years since I learned a new programming language and that perhaps I could expand my horizons and the Sandbox architecture at once.
Thus, thanks to an irresistible offer from O'reilly about ebooks, last week I bought both Learning Python, fourth edition and Programming Python, Third edition.
During the week end I produced my first tool: a Python script that installs and starts a small cluster, following the instructions given in the MySQL Cluster quick start guides. The script unpacks the cluster tarball, installs a server sandbox from it, then starts the cluster nodes and the MySQL server, and then it monitors the cluster until all the nodes are connected before finally testing the server.
Here is a sample run:

$ make_cluster.py ~/Downloads/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64.tar.gz

++ tar -xzf ~/Downloads/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64.tar.gz

++ low_level_make_sandbox --basedir=$HOME/python/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64 --sandbox_directory=mcluster --install_version=5.1 --sandbox_port=5144 --no_ver_after_name --no_run --force --my_clause=log-error=msandbox.err --my_clause=ndbcluster 
    The MySQL Sandbox,  version 3.0.12
    (C) 2006-2010 Giuseppe Maxia
installing with the following parameters:
upper_directory                = $HOME/sandboxes
sandbox_directory              = mcluster
sandbox_port                   = 5144
check_port                     = 
no_check_port                  = 
datadir_from                   = script
install_version                = 5.1
basedir                        = $HOME/python/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64
tmpdir                         = 
my_file                        = 
operating_system_user          = gmax
db_user                        = msandbox
db_password                    = msandbox
my_clause                      = log-error=msandbox.err ; ndbcluster
prompt_prefix                  = mysql
prompt_body                    =  [\h] {\u} (\d) > 
force                          = 1
no_ver_after_name              = 1
verbose                        = 
load_grants                    = 1
no_load_grants                 = 
no_run                         = 1
no_show                        = 
loading grants
.. sandbox server started
stopping server
Your sandbox server was installed in $HOME/sandboxes/mcluster

++ mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndb_mgmd -f $HOME/sandboxes/mcluster/my_cluster/conf/config.ini --initial --configdir=$HOME/sandboxes/mcluster/my_cluster/conf/
2010-06-28 21:29:57 [MgmtSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.44 ndb-7.1.4b
2010-06-28 21:29:57 [MgmtSrvr] INFO     -- Reading cluster configuration from '$HOME/sandboxes/mcluster/my_cluster/conf/config.ini'

++ mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndbd -c localhost:1186
2010-06-28 21:29:57 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1

++ mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndbd -c localhost:1186
2010-06-28 21:29:57 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1

++ mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 (not connected, accepting connect from localhost)
id=4 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @localhost  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from any host)


++ $HOME/sandboxes/mcluster/clear

++ $HOME/sandboxes/mcluster/start
... sandbox server started
Please wait. Giving the cluster time to catch up
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0, Master)
id=4 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @localhost  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from any host)


It may take up to 2 minutes to initialize ... ( 0 )
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0, Master)
id=4 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @localhost  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from any host)


It may take up to 2 minutes to initialize ... ( 5 )
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0, Master)
id=4 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @localhost  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from any host)

[...]

It may take up to 2 minutes to initialize ... ( 100 )
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0, Master)
id=4 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4)


++ $HOME/sandboxes/mcluster/use -vvv -e "create table test.t1(i int not null primary key)engine=ndb"
--------------
create table test.t1(i int not null primary key)engine=ndb
--------------
Query OK, 0 rows affected (0.45 sec)

++ $HOME/sandboxes/mcluster/use -vvv -e "show create table test.t1\G"
--------------
show create table test.t1
--------------

*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `i` int(11) NOT NULL,
  PRIMARY KEY (`i`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

to shut down the cluster, type:
$HOME/python/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndb_mgm -e shutdown
It works! Ans this is a good start to make me feel confident with Python, which I will use to develop MySQL Sandbox version 4. This cluster snippet will probably be made from scratch once the new architecture is in place. For now, it was enough to get the feeling of the language.

Friday, June 25, 2010

Quick start guides: MySQL cluster in 10 minutes

MySQL Cluster quick start guide Scared of MySQL Cluster?
Don't be. You may want to try the quick start guides that are available in the Cluster downloads page.
These guides are a step-by-step instructions set to create a simple cluster in one host.
Following the instructions, you will be able to shape up all the cluster components in 10 minutes or less. This will be far from production ready, of course, but it shows that cluster is not rocket science, and anyone can get started with it with a minimal time investment.
I tried the Linux instructions on my Mac, and it worked without need for any changes. Things may be different when you deploy a real set of servers on separate hosts, but it's a good start.
If I compare this guide with my first experience with MySQL Cluster in 2004, usability and stability have increased enormously.

Saturday, June 19, 2010

Welcome googleCL

I am writing this blog post with Vim, my favorite editor, instead of using the online editor offered by blogger. And I am uploading this post to my Blogger account using Google CL a tool that lets you use Google services from the command line.
I am a command line geek, and as soon as I saw the announcement, I installed it in my laptop. The mere fact that you are reading this blog post shows that it works.

GoogleCL is an apparently simple application. If you install it on Mac using macports you realize how many dependencies it has and how much complexity it gives under the hood.
Using an easy to understand syntax, it allows you to access your blog, pictures, calendar, contacts, videos, and online documents at your fingertips.
For example, let's query my blog for partitioning:

$ google blogger --blog="The Data Charmer" --title=partitioning list "title,url"

Hmm. No results. The manual doesn't help much, but something happened during this query. The first thing ist that I was asked to authorize the script to access my blog, and that was done by activating a key that I got in the command line. So far, so good. The second thing was a message informing me that a default configuration file was created in my home directory. Looking at that file, I saw an option saying "regex = True". Aha! So the title supports regular expressions. Let's try:

$ google blogger --blog="The Data Charmer" --title=".*partitioning" list "title"
Holiday gift - A deep look at MySQL 5.5 partitioning enhancements
The partition helper - Improving usability with MySQL 5.1 partitioning
A quick usability hack with partitioning
MySQL 5.1 Improving ARCHIVE performance with partitioning

OK. This gives me everything with the word "partitioning" in the title. But I know that some titles are missing. Comparing with the results that I get online, I see that the titles where "partitioning" is capitalized are not reported. So the search is case sensitive. What I need to do is to tell the regular expression that I want a case insensitive search. Fortunately, I know how to speak regular expressions. Let's try again.

$ google blogger --blog="The Data Charmer" --title="(?i).*partitioning.*" list "title"
Holiday gift - A deep look at MySQL 5.5 partitioning enhancements
Partitioning with non integer values using triggers
Tutorial on Partitioning at the MySQL Users Conference 2009
The partition helper - Improving usability with MySQL 5.1 partitioning
A quick usability hack with partitioning
MySQL 5.1 Improving ARCHIVE performance with partitioning

Now I feel confident enough to do some changes to my online contents.
To create this blog post, I used some of googlecl capabilities. After I created an image, I uploaded it to my Picasa album using this command:

$google picasa post -n "Blogger Pictures" -t googlecl ~/Desktop/google_cl.png

Then I asked Picasa to give me the URL of the image:

$ google picasa list -n "Blogger Pictures" --query googlecl title,url_direct
google_cl.png,https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigTW09cp-kY2qB2l0zbiCMfM3LTN-ZbjlrhKH0wlxFgOqodQ_Mf5D8aSlq_w2Hv4A2aJiHgqbQR3J8gNiUmmWgTAAPO4AHvyHDHUx6GbCZBlIZQL7QYO6JBxepsk9XuDx8q3Q8/

And then I inserted that URL in this blog post. Finally, I uploaded the blog post with this command:

google blogger --blog="The Data Charmer" --draft --title "Welcome googleCL" --tags="google,mysql,partitioning,command line,blogging" post ~/blog/welcome_googlecl.html


(Now writing online) And after I checked that the post was looking as I wanted it, I hit the "PUBLISH POST" button.
Welcome, GoogleCL!

Sunday, June 13, 2010

Free software and business in Sardinia

jAPSOn June 10th and 11th I attended two interesting workshops, both related to the Italian Free Software conference, and both organized by private institutions that have expanded their boundaries and created two quite lively international events.

The first event was held near Pula, in an innovation laboratory encased in a beautiful forest. The Technological Park of Sardegna Ricerche is a place where research and business boil together to produce new companies and to help consolidate existing ones.
The workshop was titled A community for a new business model and it was a show case of a dozen companies, both local and from abroad. After the presentations and a lovely open air lunch, there were 1-to-1 20 minutes meetings between companies, to dive deeper in technologies, ideas, and friendships. I must say that I enjoyed the event, not only because my presentation on MySQL community and values was quite well received, but also because I met interesting people and found some new ideas that need further exploration.
The main event, the fourth Italian free software conference, was not as exciting. The organizers decided to paint the event with local colors only, thus making it less interesting for me, as my job takes me around more abroad than in my own country. Oh' well. It's the Italian conference after all, and it has the right of being an all-Italian show. IMO, it's a pity, though. Exposure to external ideas is often beneficial, and I would have welcomed at least some openness beyond the national boundaries.
Anyway, the show was rescued by another workshop, organized by a local company, which was not afraid of venturing in international waters. The company is named Tzente, which means People in Sardinian, and true to their name they brought to the event several international companies, with Italian and foreign speakers. Also this event was very lively and interesting.
The hosts (Tzente) are a company to keep an eye on. They produce an open source software, called Java Agile Portal System or jAPS for short. In spite of the name, jAPS is much more than a portal. It's an integrated and versatile system to automate companies and public organizations. Their business model is simple: give the components away for free, thus building a community of users and testers, and sell integration and customization services either directly or through partners. Compared to many other startups, Tzente has a distinct advantage over the competition. They have customers. Not only they produce an excellent open source product, but their monetization strategy has already brought them large customers in Italy and abroad. They have partnerships with well known names (among which MySQL, Pentaho, Ingres, Red Hat) and I think they are going to grow fast. Good job, Tzente!
A personal note. At the first workshop I did my presentation in English, and I felt very comfortable. At the second workshop, I did the same presentation in Italian. I had the feeling that the English version was more effective. Probably I am traveling too much.

Sunday, June 06, 2010

Performance gain of MySQL 5.1 InnoDB plugin

plugin performanceYou know already that InnoDB in MySQL 5.5 has great improvements in performance and scalability. You will have to wait a few months for that, though, because MySQL 5.5 is not GA yet.
But if you need some extra performance in MySQL 5.1, you may want to use the Innodb Plugin instead of the built-in one. As of version 5.1.47, the Innodb plugin is of GA quality, and it comes with a good out-of-the-box improvement compared to the built-in engine.

To test my assumptions, I used one of my test Linux servers to perform a sysbench on 5.0.91, 5.1.47 built-in and plugin, and 5.5.4. The MySQL servers were all configured with
innodb_buffer_pool_size=5G

MySQL 4.1.47 was tested both as out-of-the-box, and with the plugin enabled.

ignore_builtin_innodb
# note: the following statements must go all in one line
plugin-load=innodb=ha_innodb_plugin.so;innodb_trx=ha_innodb_plugin.so;innodb_locks=ha_innodb_plugin.so;innodb_lock_waits=ha_innodb_plugin.so;innodb_cmp=ha_innodb_plugin.so;innodb_cmp_reset=ha_innodb_plugin.so;innodb_cmpmem=ha_innodb_plugin.so;innodb_cmpmem_reset=ha_innodb_plugin.so

default-storage-engine=InnoDBinnodb_file_per_table=1
innodb_file_format=barracudainnodb_strict_mode=1

The test was the same for all the servers. A simple sysbench both read-only and read/write on a 1M records table.

sysbench \
--test=oltp \
--oltp-table-size=1000000 \
--mysql-db=test \
--mysql-user=$USER \
--mysql-password=$PASSWD \
--mysql-host=$HOST \
--mysql-port=$PORT \
--max-time=60 \
--oltp-read-only=$ON_OFF \
--max-requests=0 \
--num-threads=8 run

What came out is that, by using the innodb plugin instead of the built-in engine, you get roughly 15% more in read-only, and close to 8% in read/write.


Note that 5.5. enhancements are more impressive in scalability tests with more than 8 cores. In this server, I have just tested a simple scenario.

I did some more testing using "ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=X" in the InnoDB table, where X changed from 4 to 16. But sysbench didn't seem to play well with compression. For low values of KEY_BLOCK_SIZE, you actually get a much worse result than the built-in engine. I have yet to figure out how I would use this compressed InnoDB in practice.