The Data Charmer: 2015

Wednesday, November 25, 2015

Percona Live Data Performance Conference 2016 - CFP and Community voting

The call for participation for Percona Live Data Performance Conference 2016 is still open. Deadline for submission is November 29th.

There are two immediately noticeable novelties in this edition:

The name change. Not "MySQL & Expo" but "Data Performance Conference." It makes the conference open to a broader set of topics.
The community voting. Proposals can get evaluated by the community before the review committee takes decisions.

I think it's a good choice. Other conferences adopt the same method. The attendees choose what they want to see and hear. In this case, it's mixed method, where the community voting is used as an indication for the review committee, which, by my understanding, has the final say.

Vote for my proposals!

Below are my proposals. Clicking on the links below will take you to the conference site, where you can say if you want to see these talks or not. You will need to register (to the site, not yet to the conference) in order to cast votes.

MySQL Replication, the community sceptic roundup

Here is a talk that is in continuous evolution. It discusses the latest advances in replication, and gives an honest evaluation of the features from a new user standpoint. This talk, if accepted, will be updated with the latest novelties in MariaDB and MySQL, if they come out in time for the conference. You can see in my blog six articles covering related matters.

MySQL operations in Docker

Another tutorial, this one aimed at users of containers who want to get started with this exciting technology. Also for this topic I have written a few articles.

MySQL deployment methods - whole servers, sandboxes, virtual machines, containers

This is a short talk that wants to explain the differences between deployment methods. Standalone physical servers, sandboxes, virtual machines, and containers are choices that require some information to get started. This talk, for which I also wrote an article, wants to show the good and bad of each choice.

The geeky pleasure of wet shaving

This is a lightning talk, which is not about data performance, but it's a geeky topic and I was asked to submit it. So here it is!

Sunday, November 22, 2015

MySQL and Docker on a Mac: networking oddity

This is a quick post only indirectly related to the series of articles about Docker that I have written recently.

Yesterday I was giving a presentation about Docker in Buenos Aires, and as usual I included a long live demo. Almost all went as expected. There was one step that I tried some time ago, and had always worked well, but when I tried to repeat it on stage, it failed miserably:

Step 1: run the container

$ docker run  -d --name mybox -e MYSQL_ROOT_PASSWORD=secret mysql/mysql-server
50acff1c7d237b2944f3fbdd5d230e4ecca2bcccd4e4ba06aa778ee20d41631e

Step 2: get the IP address

$ docker inspect --format '{{ .NetworkSettings.IPAddress}}'  mybox
172.17.0.2

Step 3: connect to the container from a local client.


$ mysql -h 172.17.0.2 -u root -psecret
Enter Password:

And here the connection hangs.

That was odd. This operation was working fine at home. I have it pasted into my first article of the series. Why it did not work here?

I could not solve the issue on stage. I had to go on with my presentation. The installation of three nodes in replication went well. Everything that I tried at home was working, except connecting from the external client to the server inside a container.

When I came back to the hotel, I realised what was different: I had done all the tests for the first article on Linux only, and I started testing on Mac when using the material for the second article. Thus I hadn't tried this particular example on a Mac until I went on stage. And of course it did not work: I was trying to connect from my Mac to a container inside a virtual machine which uses a different subnet and cannot be reached without indirection.

The only way of connecting a local client to the container is by exposing explicitly the MySQL port to the external (and in this case the "external" is the virtual machine itself) and then connect to the virtual machine.

This approach works:

$ docker rm -v -f mybox
mybox

$ docker run -p 3306:3306 -d --name mybox -e MYSQL_ROOT_PASSWORD=secret mysql/mysql-server
2798f4d1b9abb023453b1141a808014fdf7666221498b42c96e2a5d58fc6ce65

Here the port 3306 of the container is attached to the port 3306 to the host (= the virtual machine).


$ docker-machine ip default
192.168.99.100

Now we get the address of the virtual machine, so that we can connect to its MySQL port.


$ mysql -h 192.168.99.100 -u root -psecret
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.9-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

And the connection now works!

I owe it to the attendees to my talk to explain the oddity, but I believe it could be useful for others out there as well.

Saturday, November 21, 2015

Default users in MySQL 5.7

Among the many New features introduced by MySQL 5.7, we can notice a strong trend towards improving the server security by default. Two features stand out in this respect:

A password-less root is no longer the default for new installations. Unless you say otherwise, the default installers mysqld --initialize and the deprecated mysql_install_db will generate a random password which the user needs to change.
The anonymous accounts are no longer created by default. When you start MySQL, you only get the root user (and a new one: read on).

The above features are a great advance not only for security but also for usability. The anonymous users were a continuous source of mismatched connections, with difficult to explain errors, and confusion for beginners and experts alike. That's why MySQL-Sandbox has removed the anonymous accounts since its first release.

There are, however, two more changes in the privileges tables:

We have a new user, mysql.sys, which is not a usable account, as it comes with a deliberately invalid password. Its role is only to allow the sys objects to have an owner different from root. This novelty has caught me by surprise, and I had to adjust the management of users in MySQL-Sandbox, to prevent removal of this account attributes from various mysql tables: user, db, tables_priv. In fact, since we don't have anonymous accounts anymore, MySQL-Sandbox does not remove rows from db and tables_priv. (The changes were apparently new for the team maintaining MySQL images on Docker, as this user is not available on Docker MySQL containers: the entrypoint file removes all accounts from the user file.)
The test database is removed by default. This is, in my opinion, a mistake. The reason for the vulnerability of the test database was that it was open to use for the anonymous users. But since we don't have anonymous users anymore, deleting the test database is like obeying a superstitious belief.

Anyway, MySQL-Sandbox 3.1.02 comes with a few small bug fixes, among which is the preservation of the mysql.sys user and a few adjustments to the tests to take into account the latest change. The test database is always present in sandboxes, despite the above mentioned irrational removal.

Let me demonstrate the issue. In a brand new installation, we create an anonymous user and the test database:


mysql> create user '';
Query OK, 0 rows affected (0.02 sec)

mysql> select host,user from user;
+------+------+
| host | user |
+------+------+
| %    |      |
| %    | root |
+------+------+
2 rows in set (0.00 sec)

mysql> create schema test;
Query OK, 1 row affected (0.01 sec)

Then we try to access the server


# mysql -user=''
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.9-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test
ERROR 1044 (42000): Access denied for user ''@'%' to database 'test'
mysql> use performance_schema
ERROR 1044 (42000): Access denied for user ''@'%' to database 'performance_schema'
mysql> use sys;
ERROR 1044 (42000): Access denied for user ''@'%' to database 'sys'

mysql> show schemas;
+--------------------+
| Database           |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)

mysql> select table_schema,engine,count(*) from information_schema.tables group by table_schema,engine;
+--------------------+--------+----------+
| table_schema       | engine | count(*) |
+--------------------+--------+----------+
| information_schema | InnoDB |       10 |
| information_schema | MEMORY |       51 |
+--------------------+--------+----------+
2 rows in set (0.00 sec)

mysql> show grants for '';
+------------------------------+
| Grants for @%                |
+------------------------------+
| GRANT USAGE ON *.* TO ''@'%' |
+------------------------------+
1 row in set (0.00 sec)

So, the anonymous user is unable to do harm, as it can't even see the databases. The only effective measure was cleaning up the table mysql.db, which was the one giving access to the test database to the anonymous users.

Monday, November 16, 2015

MySQL-Docker operations. - Part 4: Sandboxes, virtual machines, containers.

Previous episodes:

MySQL-Docker operations. - Part 1: Getting started with MySQL in Docker.

MySQL-Docker operations. - Part 2: Customizing MySQL in Docker.

MySQL-Docker operations. - Part 3: MySQL replication in Docker.

We're going to explore the choices and the differences between various types of deployments. We will consider four use cases:

[Friendly]: Testing an application on a server where a different version of the same application is already installed (examples: a Python app requiring many libraries, a MySQL server);
[Intrusive]: Testing a potentially intrusive application (anything that changes your general settings in /usr or /etc);
[Conflicting]: Running a service that has lots of conflicting dependencies (an updated database driver compiled with a version of MySQL different from what you have installed);
[Intractable]: Running an intractable service, one of those that require a specific user to run and assume they have full control of the operating system (e.g. Postgresql, Oracle).

For each case, we need to determine the impact on our well being. We assume that the user starts with one reasonably powerful server.

The method used will affect our operations in several ways:

Cost: How much would it cost to implement this method.
Time: How much time will be needed to get things done.
Performance: Can we run things as fast as we need.
Ease of use: Can we get things done without reading a lengthy manual or using an unforgiving and complicated procedure;
Isolation: Can you run your server without affecting other servers?
Storage: Can we add or change storage easily?
Scalability: Can we easily repeat the procedure as many times as needed.
Availability: Can we run any service using this method?
Portability: Can we run this service on several operating systems?
Networking: Can we use this method to run operations that require a network?

Running servers on a regular host

The first possibility to solve our problem is simple. Take an empty server, install the service, run it. Until not long ago, before the advent of cloud computing, this was the only way to run operations: if your server is not enough, buy a bigger one, or buy many small ones and get smarter with them. But inevitably, whether we wanted to install a new service or test a new version of a known application, we needed to find money and physical space to get the job done.

Regular apps

Figure 1 : Applications within a server share the operating system and library resources

In this configuration, everything is by the book. We assume that we will use one physical host to run a main service, using the best configuration we can get to achieve the purpose.

The evaluations in the following table are based on my own experience and may differ from what others feel or need.

Requirement	score	notes
Cost	–10	You need to own a new server
Time	8	You need to install it, but it can be easily automated
Performance	10	Nothing beats bare metal
Ease of use	8	As easy as the installation procedure makes it.
Isolation	10	Not going to affect services in other machines.
Storage	–10	Changing storage requires physical manipulation
Scalability	–10	Every new server requires a new purchase
Availability	10	We can run anything.
Portability	10	We can install the O.S. that we need, and the services on top of it
Networking	0	We can use, but can't create or simulate networking.
Total	+56 / -30

The negative results should be considered separately from the positive ones. What could be a prohibitive condition for an individual could be merely a nuisance for someone in a stronger position. For example, if you already have access to bare metal servers for the next two years, thanks to an advantageous merger, you may not feel the cost factor to affect you too much.

Evaluation for the bare metal servers usability:

[Friendly]: easily used. No problems here.
[Intrusive]: difficult to use. Installing one of those means that you may have trouble installing anything else.
[Conflicting]: Extremely difficult. You may end up with the inability of upgrading a given service unless you also upgrade all the dependencies, and end up upgrading the whole operating system out of desperation.
[Intractable]: Extremely difficult. Once you install one of those, you may not be able to use the server for anything else.

Running servers in a sandbox

In this context, by sandbox I mean an application that runs on a server with strict configuration settings that prevent it from misbehaving. One example for this category is MySQL-Sandbox, where one or more MySQL servers are installed in a host, each of them configured in such a way that it does not clash with the others.

Sandboxes

Figure 2 : Sandboxes are regular applications that were carefully configured to behave well without disturbing the neighbors.

While MySQL-Sandbox is designed for testing, deploying several production servers on the same host is a common practice. The main reason for it is that commodity servers have become more and more powerful, but the software hasn't caught up to utilise such power to its fullest. In this context, using a single server on such powerful hosts would be a waste, while installing two or three servers would provide for better effectiveness.

This type is similar to running a plain bare metal server. You are running your MySQL server very close to the metal, as there is no software layer between the server and the operating system. Applications configured this way are as fast as the hardware allows. However, they are not as secure. While a lonely server running inside its dedicated host does not have to worry about clashing, a sandbox is sharing libraries and other operating system resources to other similar servers, and a clash is easy to provoke. It would be enough to mix up the configuration settings, and one or more of them would either stop working or corrupt data. Or it could happen that a sandbox could drain all the resources (e.g. the main memory) leaving all the other contenders in the cold.

Requirement	score	notes
Cost	10	No investment required
Time	8	As easy as the installation procedure makes it
Performance	10	Still bare metal, even if there is potential concurrency.
Ease of use	10	As easy as the manual says it is
Isolation	-5	Depends on the service configuration. Although it is functionally independent, the services can clash.
Storage	5	Sandboxes can be resized at will (within the limits of existing storage).
Scalability	10	Deployment of new instances is only limited by the host resources.
Availability	5	We can run only applications that are fully configurable.
Portability	-5	We can only run applications for the host O.S.
Networking		We can use but can't create or simulate networking.
Total	+58 / -10

There are several advantages to using sandboxes instead of a dedicated host, such as being able to deploy multiple servers without buying new hardware or installing virtual machines. There are, however, obvious limitations, like the lack of isolation mentioned above and the fact that only applications compiled for the host operating system can run in this fashion.

Evaluation for the sandboxes usability:

[Friendly]: easily used. This is the strong point of sandboxed applications.
[Intrusive]: Difficult to use. Sometimes impossible.
[Conflicting]: Difficult but possible to use. It's one of the case where having a conflicting application used in a parallel environment could be beneficial.
[Intractable]: Almost impossible to reduce to a sandboxed environment.

Running servers in virtual machines

Figure 3 : A virtual machine isolates the application and the operating system.

Virtual machines are the heart of current cloud computing strategies. The ability of creating servers that behave almost like bare metal ones –without need for physically buying them and transporting into a data center– has changed the economy of most companies in the past decade.

Requirement	score	notes
Cost	-5	Moderate investment required
Time		As easy as the installation procedure makes it. But the O.S. must be installed as well
Performance	-10	There is much overhead from the additional layers and the need of having a full O.S..
Ease of use	8	Everything that is allowed through the interface.
Isolation	9	It can be as good as a physical host. There is still the risk of a VM affecting negatively others.
Storage	5	V.M.s can be resized at will (within the limits of existing storage).
Scalability	10	Deployment of new instances is only limited by the host resources.
Availability	10	We can run anything.
Portability	10	We can install the O.S. that we need, and the services on top of it
Networking	10	We can use and create networks.
Total	+62 / -15

Compared to bare metal, virtual machines can scale at will. You can deploy in a few minutes a new VM of the size that is needed for your current business, and get rid of it when the need ends. Unlike sandboxes, you can run any operating system and any application. In addition, you can have a network for public and private communication between servers.

There are prices to pay. First of all, it will cost you. Depending on the usage, they could be much cheaper than buying and storing your own physical servers, but they won't be free. Sure, you can install a virtual machine in your initial server, the same way that you can do it for a sandbox, but then you get into the second great limitation: performance. Even with the best software available today, the performance of a server running in a VM is greatly inferior to a server on bare metal.

You can compensate for performance by splitting the job into many parts and deploying many small virtual machines that will work in parallel. When a solution like this is successfully deployed, the performance of the group of virtual machines can surpass that of a single bare metal server. Unfortunately, to achieve this goal, you would incur more costs than buying a single server, and your application will need to be adapted to working in a distributed environment. This solution can work, and it has been deployed successfully in many cases, but it is not a one-size-fits-all, and done with poor planning can backfire.

Evaluation for the virtual machines usability:

[Friendly]: easily used. No problems here.
[Intrusive]: Easily used with overhead. Just install another virtual machine.
[Conflicting]: Easily used with overhead.
[Intractable]: Easily used with overhead.

Running servers in containers

Docker

Figure 4 : Docker containers are thin layers of libraries and applications on top of a common kernel.

Containers are a growing trend in the virtualization ecosystem. If, by the previous statement, you believe that containers are virtual machines, you need to reconsider immediately, or risk failing to understand this technology. Containers are not virtual machines, although they have many things in common. Like virtual machines, containers are entities that are not in the host computer, can be deployed in a package, started, and the service inside it can be used more or less like a server on bare metal.

The differences between virtual machines and containers are a few, and very important:

A container does not pack a full operating system, but just a thin layer of the needed libraries to run the service in it;
The service itself is often a stripped down version of the original application.
Most important, the software in the container uses the host kernel directly, without any intermediate layer.
For the above reasons, while a virtual machine starts up in minutes, a container starts up in less than a second.

A container is a well packaged application that can be downloaded very quickly, and once downloaded can be instantiated several times with incredible speed.

Another notable difference between containers and virtual machines is that containers are less isolated, because they use the same kernel as the host, rather than a virtualized one. On one hand, this makes containers less secure, on the other hand, they are blazingly fast.

Docker shared

Figure 5 : Docker containers can share libraries and other image layers

There is another reason for containers speed and low storage occupancy. Docker containers are deployed in layers. Some of those layers can be used by a single container, others could be in common between two or more containers. While a virtual machine is an enormous blob which can reach several GB, a container could be a thin modification of an existing image, and thus can be downloaded in seconds and deployed even faster.

Requirement	score	notes
Cost	10	No investment required
Time	10	Fast, fast, fast!
Performance	9	Almost as fast as running on bare metal. Tiny overhead.
Ease of use	3	Requires some learning and new workflows.
Isolation	7	Much better than a sandbox. Less than a V.M., because containers use the same kernel.
Storage	5	Containers can be resized at will (within the limits of existing storage).
Scalability	10	Deployment of new instances is only limited by the host resources.
Availability	3	We can run only applications that have been adapted for containers.
Portability	-5	We can only run applications for the host O.S.
Networking	10	We can create and use netweoks.
Total	+67 / -5

What are the strong points of containers? Low cost (or no cost, if all you need is what fits in your current server), good performance, private networking, easy to scale.

The limitations, as of today, are portability (applications can only run in the same OS as the host) and the ease of use. This is a point that is going to change. Using containers requires some changes in the applications (or finding ready made images) and an understanding of the environment, which could be intimidating for people used to the old ways. But once you get past the initial learning phase, everything feels very easy, and eventually the usage will be far easier than the old ways.

Evaluation for the containers usability:

[Friendly]: easily used. No problems here.
[Intrusive]: easily used, with little or no overhead.
[Conflicting]: easily used with little or no overhead.
[Intractable]: difficult to use, sometimes impossible if the intractable application or service was built without flexibility in mind.

All solutions comparison

For convenience, I made a table with a comparison of the solutions examined above.

I must stress that these evaluations are my own, very much subjective, based on my experience. The evaluations may differ from others, and possibly also from my own in a few years or months. Talking about Docker is like catching eels: it's a moving target where the technology evolves and improves daily. This fluidity is possibly the most appealing characteristic of Docker and the container related technology: its evolution has been and continues to be fast and effective, addressing the users needs at incredible speed.

Requirement	Bare metal	Sandbox	Virtual machine	Container
Cost	–10	10	–5	10
Time	8	8	0	10
Performance	10	10	–10	8
Ease of use	8	10	8	3
Isolation	10	–5	9	7
Storage	–10	5	5	5
Scalability	–10	10	10	10
Availability	10	5	10	3
Portability	10	–5	10	–5
Networking	0	0	10	10
Total	+56 –30	+58 –10	+62 –15	+66 –5

I believe we haven't seen the end of this trend yet. What we have seen so far with containers and virtual machines seems to aim at an architecture built on micro services. Containers could take a substantial role in the transition towards that reality.

What can we take away from this analysis?

Bare metal servers are not outdated yet. There are still cases where they are irreplaceable. Despite the cost associated with their usage, they are not extinct yet, but just.
Virtual machines are still in charge of the scalability department in many cases. However, they feel the advance of containers and need to either evolve or merge into a more flexible architecture to deal with increasing demands from users.
Containers are the new force in IT. They can play well with both bare metal servers and virtual machines, waiting for the rise of container-oriented operating systems, which already exist and aim at world domination in a not distant future.

I see a future where the rise of containers and micro systems will force software makers to simplify their products and make them more modular and easy to play with. This trend is important in the current cloud architecture and will become vital when containers take over.

In the meantime, I am not giving up MySQL-Sandbox, which is still indispensable I'm most scenarios, but I am starting to rethink the architecture to fit smarter future uses.

MySQL deployment summary

With all the above considerations, where do we stand with MySQL? My view is that we're still in middle ground. MySQL is still used heavily on bare metal, either as a stand-alone server or as a part of multi server deployments in the same host.

It is also massively employed in the cloud, where it offers many advantages for deployment flexibility and ease of scalability. Yet it still lacks the agility necessary to be a native cloud component. There are several attempts at creating a better cloud player out of MySQL, some successful, some less so.

When it comes to containers, MySQL has still much work to do to become an efficient building block in the new ebullient architecture expansion. The MySQL team provides an official package, which is a first step towards becoming a good player. But in the near future there will be demands of more integration and better modularity than what's available today. Looking at the internals of MySQL deployment in a container shows that the system is struggling to adapt to the new medium. I see the container revolution as an opportunity for established applications like MySQL to improve their usability and increase their ability to play well with other components of the emerging IT infrastructure.

What's next

In the next (and last) episode we will see MySQL, Docker and orchestrating tools playing together to deliver faster and more powerful operations.

Wednesday, November 04, 2015

MySQL-Docker operations. - Part 3: MySQL replication in Docker

Previous Episodes:

MySQL-Docker operations. - Part 1: Getting started with MySQL in Docker

MySQL-Docker operations. - Part 2: Customizing MySQL in Docker

With the material covered in the first two articles, we have all the elements needed to set up replication in Docker. We just need to put the pieces together.
If you want to do everything by hand, it will only take a few minutes. The steps are not complicated. If you have followed the reasoning in the past episodes, you will know what to do.
Or, you can make your life easier by using the ready-made scripts available in Github as MySQL replication samples. In fact, what this article will do in practice is adding comments to a stripped down version of the deployment script, which will make things clear.
First, we will use two templates for the configuration files. Notice that the server-id value is replaced by a placeholder. We will also use a smaller options file for the client's username and password.

$ cat my-template.cnf
[mysqld]
user  = mysql
port  = 3306
log-bin  = mysql-bin
relay-log = mysql-relay
server-id = _SERVERID_
master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
enforce-gtid-consistency

$ cat node-my.cnf 
[client]
user=root
password=secret

First phase: deploying the containers

Note: Don't try to copy-and-paste the code below. It is a reduced version that is used here only for the sake of commenting it. I have deliberately added line numbers to make copy-and-paste harder. Take the code from github instead.

MySQL-Docker operations. - Part 2: Customizing MySQL in Docker

Previous Episodes:

MySQL-Docker operations. - Part 1: Getting started with MySQL in Docker

After seeing the basics of deploying a MySQL server in Docker, in this article we will lay the foundations to customising a node and eventually using more than one server, so that we can cover replication in the next one.

Enabling GTID: the dangerous approach.

To enable GTID, you need to set five variables in the database server:

master-info-repository=table
relay-log-info-repository=table
enforce-gtid-consistency
gtid_mode=ON
log-bin=mysql-bin

For MySQL 5.6, you also need to set log-slave-updates, but we won't deal with such ancient versions here.
Using the method that we've seen in Part 1, we can use a volume to change the default /etc/my.cnf with our own.

$ cat my-gtid.cnf
[mysqld]
user  = mysql
port  = 3306
log-bin  = mysql-bin
relay-log = mysql-relay
server-id = 12345

master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
enforce-gtid-consistency

However, this approach may fail. It will work with some MySQL images, but depending on how the image is built, the server may not install at all.

$ docker run --name boxedmysql \
    -e MYSQL_ROOT_PASSWORD=secret \
    -v $PWD/my-gtid.cnf:/etc/my.cnf \
    -d mysql/mysql-server
b9c15ed3c40c078db5335dcb76c10da1788cee43b3e32e20c22b937af50248c5

$ docker exec -it boxedmysql bash
Error response from daemon: Container boxedmysql is not running

The reason for the failure is Bug#78957. When my.cnf contains log-bin and mysql is called prior to the installation to perform some detection tasks, the server creates the binary log index in the data directory. After that, the installation task will abort because the data directory is not empty. It sounds as if there is a set of unnecessary actions here (the server should not create the index without other components in place, and the installer should not complain about finding a harmless file in the data directory) but this is the way it is, and we should work around it. At the time of writing, the bug has received a temporary fix and the installation now works.
All considered, it's best that we are forced to run things this way, because there are side effects of enabling GTIDs at startup: there will be unwanted GTID sets in the server, and that could be annoying.

MySQL-Docker operations. - Part 1: Getting started with MySQL in Docker

Docker is one of the fastest growing trends in IT. It allows fast deployment of services and applications on a Linux machine (and, with some limits, on other operating systems). Compared to other methods of deploying databases, such as virtual machines or application isolation, it offers faster operations and better performance.
Many people, surprised by the sudden advance of this technology, keep asking What is Docker? And why you should use it?
I will write soon an article with a deep comparison of the three methods (VM, container, sandbox), but for now, we should be satisfied with a few basic facts:

Docker is a Linux container. It deploys every application as a series of binary layers, containing just the minimum dependencies (libraries and applications) to make the service work;
It stores images in a central registry, from where the docker client can download them quickly;
By its definition, it is lightweight. If you have the images already in your system, deployment of the service happens in seconds.
Unlike virtual machines, where you can deploy virtualized Windows and other non-Linux environment, Docker is Linux-only. You can virtualize every service, provided that it runs on Linux.
Docker can run applications in various flavors of Linux at once. It actually makes the Linux flavor dependency transparent, to the point that the users barely realize that.

Installing Docker

Docker installation is pretty much straightforward. The Docker documentation covers the basics and the fine points of installing in any operating system. Rather than repeating the procedure here, I recommend looking the pages for Ubuntu, Mac OS X, or Windows.
Once the installation is complete, the commands shown in this article will apply to all platforms. When there are exceptions, it will be noted in the text.

Sound advice for GTID, with caveats

During the PerconaLive conference in Amsterdam, I attended a session where I heard a good piece of advice about using GTID. It amounts to: look at SHOW SLAVE STATUS output, and if you see more than one line in the Executed_Gtid_Set field, this tells you immediately if someone has written on a slave database.
This is good advice. Let's dissect it. Here is what a regular slave looks like, when nobody has messed up with it:

SHOW SLAVE STATUS\G
*************************** 1. row ***************************
 [...]
             Master_Server_Id: 1
                  Master_UUID: 00013454-1111-1111-1111-111111111111
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
1 row in set (0.00 sec)

What you see here is a slave that has received transactions from a single source (Retrieved_Gtid_Set lists only one GTID set) and has applied data from a single source (also Executed_Gtid_Set shows a single item.)
Notice that this advice holds true even when the slave being considered is an intermediate one, i.e. a relay slave which is master of one or more slaves. Due to the nature of GTIDs, even though the intermediate slave is recording the transactions to its own binary log, the transaction identifier does not change. Thus you should see a clean set of transactions throughout the chain. For example, if you have another slave that is replicating from slave #2, you would see something like this:

SHOW SLAVE STATUS\G
*************************** 1. row ***************************
 [...]
             Master_Server_Id: 102
                  Master_UUID: 00013456-3333-3333-3333-333333333333
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:

MySQL-Sandbox 3.1.01 - First release after the change

I have released MySQL-Sandbox 3.1.01, which is the first release after the move to GitHub. While the changes are not so spectacular (it's a minor release, with mostly bug fixes), I am pleased to see that the move has started producing collaboration. Two of the changes were provided by Daniël van Eeden and Mark Leith, who have scratched some of their own itches by providing useful patches.

All in all, this period of working with GitHub has been liberating. Although Bazaar plays with the same principles of git, it lacks most of the tools and the know-how which characterizes git. Add to this that also my team has moved Tungsten Replicator to Github, and with that I found myself all of a sudden free of old revision control systems, and master of my own time.

Back to MySQL-Sandbox: while its enhancements may not amount to much, it helped me to discover several bugs in MySQL, some of which were addressed and solved quickly. So, I have had a deeper relationship with the community, with the experience of being at both ends of the collaboration ops.

The last notable piece of news about this release is that it has been tested with the latest and greatest available: a preview of MySQL 5.7.9 and the latest MariaDB 10.1.6. With this, I hope to witness a GA release of either flavor that does not break MySQL-Sandbox. We'll see!

Monday, September 28, 2015

MySQL 5.7 : Playing with mysqlpump

MySQL 5.7 comes with a new backup tool, named mysqlpump, which is almost the same as mysqldump with the ability of extracting data in parallel threads.

I tried a little experiment. Using a server containing 11 databases, with a total of 300 tables and about 20 million rows (roughly ≈ 10GB,) I used both mysqldump and mysqlpump to get a backup.

mysqldump --all-databases  > dump.sql

mysqlpump --all-databases \
    --add-drop-database --add-drop-table --skip-watch-progress \
    --default-parallelism=10 \
    --parallel-schemas=db,db1,db2 \
    --parallel-schemas=db3,db4,db5 \
    --parallel-schemas=db6,db7,db8 \
    --parallel-schemas=db9,db10 > pump.sql

The backup with mysqldump took 3 minutes and 33 seconds. The one with mysqlpump took 2 minutes and 55 seconds (saving 38 seconds). This does not seem to be a great gain. I experimented with several values of default-parallelism and different grouping of databases, and also without any parameters at all, but I always get the same time.

If there is a different way of invoking mysqlpump to use parallelism better, I would like to know.

There are four interesting points about mysqlpump that users should know:

mysqlpump has options to include and exclude objects (databases, tables, routines, users) from the backup. This is a long awaited feature that will be welcome by many DBAs.
The option --no-data is called --skip-dump-rows. (Just in case you want to use the new tool alternate way of reproducing DDL. But be aware that there is at least one bug)
A backup created with mysqlpump can only be loaded into a database of the same name. This is due to the parallel work, which requires that the INSERT statements contain both the database and the table names. But it means that, unlike with mysqldump, you can't backup tables from database X and load them to database Y.
The most serious limitation of mysqlpump, which I have seen both in the manual and in a blog article is that, while the backup is parallelized, the restore is serialized. Both sources say to run "mysqlpump > file.sql" and "mysql < file.sql". What is the advantage of extracting data with N parallel threads if I then need to apply it with a single thread? I would have expected an option to create N files, which I can then load using several background tasks, or even better an option in the mysql client to handle parallel backup files. I may be missing something here. I will appreciate comments by more savvy users.

The idea is good. The tool still has some rough edges, but I am sure it can be improved.

Tuesday, September 15, 2015

Percona Live Amsterdam - September 21-23, 2015

PL EuropeLogo FullInv CMYK Final Horiz EMAIL

I am attending Percona Live Amsterdam 2015 on September 21-23, 2015.

I will be on stage three times:

September 22nd at 2:10pm Pivot tables: analytics in pure SQL
September 23rd at 2:10pm The future of replication is today: new features in practice
September 23rd at 5:10pm as host and speaker in the Lightning talks

My first talk is a topic that has ben among my favorites for long time: I published an article about it in 2001, and several more in the years to come.

The second one is a summary of what I have written recently about replication technologies.

The lightning talks are a collection of 5-minutes long talks that are presented by different speakers. For the first time, the LT are held in a separate room instead of being attached to one of the community events. It will be fun!

Percona has just released a mobile app for the conference for both iOS and Android. With it, it is possible to set a personalized schedule, follow the show more closely, and get in touch with other attendees. It is a very good addition!

There is much to watch at the conference, and I look forward to seeing the latest innovation in the field. I will miss some very interesting talks because they are at the same time as mine (!!) but I hope I will catch up with the speakers in the conference hall.

Monday, September 14, 2015

Improving Sakila database

The Sakila sample database was created almost 10 years ago, as a sample set of data for MySQL courses and examples.

The database was developed by MySQL employees, with substantial contributions form the community.

Recently, the database was updated to use some of the features in MySQL 5.7. As a result, we had two sets of samples, one to use with MySQL 5.0+, and one that only loads with MySQL 5.7.

I filed a feature request, offering a patch to use conditional schema and data changes, which was incorporated very quickly into the official release.

The current release, available within the MySQL docs, has conditional comments such as this:

/*!50610 ALTER TABLE film_text engine=InnoDB */ ;

Using these comments, we can enable specific features if the version is at least the one indicated in the comment. So, for example, we can use InnoDB tables with full-text indexes starting with version 5.6. The original table is MyISAM, but if the current version is at least 5.6.10 (that's the meaning of !50610) then the engine is changed to InnoDB.

>Similarly, there is a GEOMETRY column and SPATIAL key in the 'address' table, which are only enabled for MySQL 5.7.5+. A similar comment allows the loading of the relevant data only in MySQL 5.7.

Using these new files, you can install the Sakila database using any version of MySQL from 5.0 onwards, and it will always load correctly.

Monday, September 07, 2015

Sample employees database migrated to GitHub

It's migration time. There was another project that I use often and was still in Launchpad. The Sample Employees Database is now on GitHub, under the same license it had before (CC A-SA 3).

Figure 1 - Employees database
This database is interesting because it is not too small (like Sakila) and not too big. It has enough data to allow you to test in a non trivial way.

How MySQL-Sandbox is tested, and tests MySQL in the process

MySQL-Sandbox is a great tool for testing a new release, and in fact this is what I do when a new MySQL tarball becomes available. I don't think many people are aware of the full testing capabilities of the sandbox, though.
When you think about testing, you may just think of creating a sandbox with the new tarball, and then hammering it with your pet procedure. That works, of course, as the main purpose of MySQL-Sandbox is to allow you to do just that. There is, however, a full test suite that can tell you in a short while if your tarball is compatible with the past or not.
This procedure is quite strict. It has happened several times that I caught a bug in a new release of MySQL, or Percona Server, or MariaDB, just by running this suite.

MySQL replication in action - Part 5 - parallel appliers

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

MySQL replication in action - Part 2 - Fan-in topology

MySQL replication in action - Part 3 - All-masters P2P topology

MySQL replication in action - Part 4 - star and hybrid topologies

Parallel replication overview

One of the main grievance of replication users is that, while a well tuned master server can handle thousands of concurrent operations, an equally tuned slave is constrained to work on a single thread. In Figure 1, we see the schematics of this paradigm. Multiple operations on the master are executed simultaneously and saved to the binary log. The slave IO thread copies the binary log events to a local log, and on such log the SQL thread executes the events on the slave database. When the master is very active, chances are that the slave lags behind, causing hatred and nightmares to the DBAs.

Figure 1 - Single applier

New MySQL Sandbox 3.1 - GitHub, and usability

I have three pieces of information to share about MySQL::Sandbox:

Version 3.1.0 has migrated from Launchpad to GitHub
This version is released under the Apache license. Both these changes are meant to improve and promote cooperation on the project.
There is an important change related to usability. When using replication with MySQL::Sandbox and MySQL 5.6+, the server UUIDs become more readable (see below).

First, some words on the location changes. About two years ago, I started plans for a rewrite of MySQL::Sandbox. Then, I had some unexpected changes, which involved moving home to a different continent twice within twelve months. The project was neglected, but I haven't dismissed it. While I wait for the rewrite to start, I wanted to get rid of the obstacles for rapid development, and I decided to transfer the current codebase to GitHub. This will allow me to use only one RCS instead of three (My team has abandoned svn too ¹).
Apart from the changes described in this post, there is little difference in the code published on GitHub.

Tungsten Replicator moved to GitHub with Apache license

It had been in the making for long time. Google announced that Google Code would be closing, and since then the Continuent team has been hard at work to handle the transition. You can guess it: this operation would have been quicker if it had been done by a small company like we were one year ago, but being part of a large corporation introduces some constraints that have affected our schedule.

However, our wish has always been, and still is, to keep Tungsten Replicator as an open source product, with full functionalities and with the full benefits that the open source development model offers.

Today, Tungsten Replicator is available on GitHub as vmware/tungsten-replicator, and it is wearing new clothes. It is not GPL anymore. In an effort to facilitate contributions, its license was changed to Apache 2.0.

Feature-wise, there is little difference from the previous release of 4.0. Mainly, we have cleaned up the code and moved out the pieces that no longer fit:

Bristlecone was removed from the package. It is used only for testing, and it will be released separately. There is no need to duplicate it into every Tungsten tarball.
The cookbook recipes have been retired. These scripts were created when the installer was still in its infancy and we had little documentation. Therefore, it was convenient to have wrappers for the common installation operations. Using the manual, it is pretty easy to install master/slave, fan-in, and multi-master topologies. The biggest reason for removing the cookbook, though, is that it was only useful for MySQL replication. If you need heterogenous deployments, the cookbook was an obstacle, rather than being helpful.
Some files were shuffled within the deployment tree. The ./tungsten-replicator/scripts directory was merged with ./tungsten-replicator/bin, the applier templates were moved from samples to a dedicated path, and we also did some other similar cleanup.

Although it has changed location and license, this is not a "release." If you compile the code, it will come up as 4.1, but it is still work in progress. Same as what was happening in the previous repository, we tag the code with the next version, and start working on it until it is deemed ready for release. The latest release for production (4.0.1) is still available from the old directory.

The code is available on GitHub, which makes collaboration much simpler than the previous repository. Take advantage of it: fork it, and help make the best replication tool even better!

MySQL usability issues for replication

In my latest series of advanced replication features, I came across several usability issues, which I would like to recap here. For each section of this list of requests, I make a wish list, with some general comments.

INSTALLATION

As the maintainer of MySQL Sandbox, a tool that wants to facilitate the installation of MySQL for testing, I am particularly sensitive to the annoyances during installation. I have covered the matter extensively in recent weeks and many times before. My point is that the logging of the installation must be useful for the DBA, not to the MySQL developers.

Make a big distinction between [Info] and [Warning] messages. Warnings should only appear when there is some risk. Using a warning to tell the user that the InnoDB log file was created is a nonsense.
… and then make [Info] messages suppressible. Currently, I can suppress warnings, but then I would suppress also the real warnings, in addition to the fake ones.
Make mysqld –initialize use the correct syntax for timestamps, instead of issuing a warning about the obsolete syntax that was used. (See In search of cleanliness : the elusive quiet installation for a full explanation)

MULTI-SOURCE

Here we have two main problems. The first one is that the introduction of multi source channels has left some new ways of shooting yourself in the foot. By mixing the new explicitly named channels with the hidden one (but still used in regular replication!) which is named "" (= the empty string), we may start to see a new series of blunders in replication.
The second set of problems comes from weak integration between existing features and the new one. There are operation modes, such as semi-synchronous, delayed, and parallel replication that were designed with a single data stream in mind, and that lack the capability of being tuned for different channels.

Make it impossible to use the unnamed channel and the named ones at the same time. Currently only a very weak protection exists against mixing named and unnamed channels. Also the syntax for channel ’’ should result in an error when named channels are defined.
Integrate features that were designed for single sources to work better with multi source: semi-sync replication, parallel applier, delayed replication.

MySQL replication in action - Part 4 - star and hybrid topologies

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

MySQL replication in action - Part 2 - Fan-in topology

MySQL replication in action - Part 3 - All-masters P2P topology

Introducing star topology.

In all-masters P2P topologies, we have seen that we have a way of deploying a topology where all nodes are masters, and achieve better efficiency and stability than ring topologies. That method comes at the price of a complex setup, which requires, for a N-node cluster, N*(N-1) connections.
We can achieve the same result as in a P2P all-masters topology by trading connections for stability. In a star topology (Figure 1) all nodes are masters, but they do not connect to each other directly. There is a special node, named hub, which receives the changes produced by each endpoint and spreads them to the others.
Topologies star

Figure 1 - A star topology

MySQL replication in action - Part 3: all-masters P2P topology

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

MySQL replication in action - Part 2 - Fan-in topology

In the previous article, we saw the basics of establishing replication from multiple origins to the same destination. By extending that concept, we can deploy more complex topologies, such as the point-to-point (P2P) all-masters topology, a robust and fast way of moving data.

Introduction to P2P all-masters topology

A P2P (Point-to-point) topology is a kind of deployment where replication happens in a single step from the producer to the consumers. For example, in a master/slave topology, replication from the master (producer) reaches every slave (consumer) in one step. This is simple P2P replication. If we use a hierarchical deployment, where every slave that is connected to the master is also replicating to one or more slaves, we will have a 2-step replication (Figure 1). Similarly, in circular replication, we have as many steps as the number of nodes minus one (Figure 2.)
Hierarchical master slave processing

Figure 1 - Hierarchical replication depth of processing

MySQL replication in action - Part 2 - Fan-in topology

Introduction: where we stand

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

In the latest releases of MySQL and MariaDB we have seen several replication improvements. One of the most exciting additions is the ability to enhance basic replication with multiple sources. Those who have used replication for a while should remember that one of the tenets of the “old” replication was that a slave couldn’t have more than one master. This was The Law and there was no escape ... until now. The only way to work around that prohibition was to use circular replication, also known as ring replication, where each node is slave of the previous node and master of the next one.
Circular replication

MySQL replication in action - Part 1: GTID & Co

In the theoretical part of this series, we have seen the basics of monitoring. In that article, though, we have barely mentioned the new tools available in MySQL 5.7 and MariaDB 10. Let’s start from something that has the potential of dramatically changing replication as we know it.

Crash-safe tables and Global transaction identifiers in MySQL 5.6 and 5.7

Global transaction identifiers (GTID) is a feature that has been in my wish list for long time, since the times I was working with the MySQL team. By the time I left Oracle, this feature was not even in the plans.
When MySQL 5.6 was first disclosed, the biggest improvement for replication was the introduction of crash-safe tables (see Status persistence in Monitoring 101.) There are two tables in the mysql database, named slave_master_info and slave_relay_log_info. At the beginning, these tables were using the MyISAM engine, thus defeating the purpose of making them crash-safe. In later versions, the developers decided to bite the bullet and create these tables with innodb from the beginning.
These two tables allow us to see the same information previously stored in the files master.info and relay_log.info. What makes these tables convenient is that they should survive a crash better than the standalone files.

MySQL::Sandbox 3.0.66 - improved usability and support for newest releases

The latest MySQL Sandbox, version 3.0.66 is out. It has a few new features (as always, when I find myself doing the same thing many times, I script it) and improved support for latest releases of MySQL. You can now install, among other versions, MySQL 5.7.8 and MariaDB 10.1.x

Some notable additions in this release are in the scripts that are created and customized for each sandbox. There are many of them and when one more arrives, it's easy to overlook it. So, here are the new arrivals.

Changed defaults between MySQL 5.6 and 5.7

MySQL 5.7 comes with many changes. Some of them are better explained than others.

I wanted to see how many changes I could get by comparing SHOW VARIABLES in MySQL 5.6 and 5.7.
The most notable ones are:

binlog_format: the default is now ROW. This variable affects the format of the binary log, whether you use it as a backup complement or for replication, the change means bigger binary logs and possibly side effects.
binlog_error_action now defaults to ABORT_SERVER. If the server cannot write to the binary log, rather than continuing its work without logging, it shuts down. This could be a desirable course of action, but better be prepared for the eventuality.
innodb_strict_mode is enabled by default, which is probably a good thing, but it means that previously accepted events will now generate an error instead than a warning.
sql_mode is now STRICT by default. While many well prepared users will be pleased with this change, which was advocated as best practice by some DBAs, the practical outcome is that several exiting applications may break because of unclean input.
sync_binlog, which affects data safety but also server performance is enabled.

In search of cleanliness : the elusive quiet installation

UPDATE: Almost solved! See at the end.

A clean installation of a database server is one where everything goes according to the expectations. It used to be easy: you only had to do what the manual says, and, presto! you would see your database server installed and ready to use. If something went wrong, you got one or more error messages that informed you of what needs to be fixed.

Sometimes, rarely, it happened that you got also a warning message, telling you that while the installation was successful, you could improve it by fine tuning this and that. No big deal.

Gone are those times. A clean installation nowadays is a much harder exercise, if not impossible. Let’s give it a try using MySQL 5.7.7.

Yet another MySQL 5.7 silent change

When a new version of MySQL appears, the first source of information for the brave experimenter is a page in the manual named What is new in MySQL X.X, also known as MySQL in a nutshell. For MySQL 5.7, the in-a-nutshell page lists quite a lot of changes. In that page, the list of removed features is enough to send a chill down the spine of most any DBA. Some of the items in the deprecation section are removals in disguise, as they require immediate, rather than delayed, action to use the new version with existing application (SET PASSWORD comes to mind immediately.)

In all that abundance of changes, there is at least one that may cause huge discomfort to many users:

mysql [localhost] {msandbox} (information_schema) > select @@version;  
+-----------+  
| @@version |  
+-----------+  
| 5.7.8-rc  |  
+-----------+  
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (information_schema) > select * from GLOBAL_STATUS;  
Empty set, 1 warning (0.00 sec)

mysql [localhost] {msandbox} (information_schema) > show warnings\G  
*************************** 1. row ***************************  
  Level: Warning  
   Code: 1287  
Message: 'INFORMATION_SCHEMA.GLOBAL_STATUS' is deprecated and will be removed in  
a future release. Please use performance_schema.global_status instead  
1 row in set (0.00 sec)

MySQL 5.7.8 : features, bugs and rumors

I’ve had a look at a preview release of MySQL 5.7.8, some time before it became available to the general public (perks and duties of an Oracle ACE) and I found a few interesting things among the release notes and the tarball itself:

There is a new tool named mysqlpump, which is intended as a replacement for mysqldump, with parallel processing, compression, progress watch, the long awaited ability of excluding databases or tables, and more.
The json functionality has been fished out from the labs and added to the main distribution.

I was initially pleased with this preview when it installed without errors in MySQL Sandbox, but my pleasure didn’t last long, as I soon stumbled upon a bug that affects replication. Until MySQL 5.7.7, a user with just REPLICATION SLAVE privileges can activate replication. In MySQL 5.7.8, though, replication fails with a baffling error:

slave1 [localhost] {msandbox} ((none)) > show slave status\G
[...]
            Last_IO_Errno: 1142
            Last_IO_Error: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: SELECT command denied to user 'rsandbox'@'localhost' for table 'global_variables'

MySQL replication monitoring 101

Replication is the process that transfers data from an active master to a slave server, which reproduces the data stream to achieve, as best as possible, a faithful copy of the data in the master.

To check replication health, you may start with sampling the service, i.e. committing some Sentinel value in the master and retrieving it from the slave.

Sentinel data: Tap tap… Is this thing on?

If you want to make sure that replication is working, the easiest test is using replication itself to see if data is being copied across from the master to the slaves. The method is easy:

Make sure that the data you want to see is NOT in the master or in the slave. If you skip this step, you may think that replication is working, while in fact it may not.
Either create a table in the master or use a table that you know exists both in the master and the slave.
Insert several records in the master table.
Check that they are replicated in the slave correctly.
Update a record in the master.
Watch it changing in the slave.
Delete a record in the master.
Watch it disappear in the slave.

MySQL 5.7 : no more password column!

Maintaining a project like MySQL::Sandbox is sometimes tiring, but it has its advantages. One of them is that everything related to the server setup comes to my attention rather earlier than if I were an average DBA or developer.

I try to keep MySQL Sandbox up to date with every release of MySQL and (to a lesser extent) MariaDB [1]. For this reason, I am used to trying a new release with MySQL Sandbox, and … seeing it fail.

Of the latest changes in MySQL, probably the most disruptive was what happened in MySQL 5.7.6, where the mysql.user table lost the password column.

Yep. No ‘password’ column anymore. And just to make the setup procedure harder, the syntax of SET PASSWORD was changed, and deprecated.

Previously, I could run:

mysql [localhost] {msandbox} (mysql) > select version();  
+-----------+  
| version() |  
+-----------+  
| 5.6.25    |  
+-----------+  
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > select host,user,password from user;  
+-----------+-------------+-------------------------------------------+  
| host      | user        | password                                  |  
+-----------+-------------+-------------------------------------------+  
| localhost | root        | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | rsandbox    | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |  
+-----------+-------------+-------------------------------------------+  
8 rows in set (0.00 sec)

In the latest releases, though, this fails.

mysql [localhost] {msandbox} (mysql) > select version();  
+-----------+  
| version() |  
+-----------+  
| 5.7.8-rc  |  
+-----------+  
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > select host,user,password from user;  
ERROR 1054 (42S22): Unknown column 'password' in 'field list'

Instead of a password column (which was CHAR(41)), we have now an authentication_string column of type TEXT.

+-----------+-------------+-------------------------------------------+  
| host      | user        | authentication_string                     |  
+-----------+-------------+-------------------------------------------+  
| localhost | root        | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | rsandbox    | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |  
+-----------+-------------+-------------------------------------------+

Fixing MySQL Sandbox to handle this issue and to be at the same time compatible with previous releases was quite challenging, but in the end I did it. Recent versions of the sandbox can handle all the releases from Oracle, Percona, and MariaDB without showing hiccups.

So, for testing, the issue is solved. Now comes the hard part: when thousands of database administration procedures will break down for lack of the password column. To all the DBAs and database developers out there: good luck!

It is my pleasure to disclose that MariaDB 10.1 runs in MySQL Sandbox 3.0.55+, with only minimal changes. ↩

Sunday, July 19, 2015

MYSQL Sandbox 3.0.55 and new Github replication scripts

Both MySQL and MariaDB have been busy, each introducing new features, sometimes creating the same feature, often with different syntax.

This is sometimes good for users, who have a wide choice. And sometimes it is bad, as once you are used to the deployment and syntax of one flavor, it is hard to switch to a different one. This problem is enhanced if you are dealing with an application, MySQL Sandbox, that needs to work well with all flavors.

The latest releases of MySQL Sandbox (3.0.51 to 3.0.55) have been necessary to solve minor and major troublesome points with MySQL 5.7.8 and MariaDB 10.1.

The current version (3.0.55) can install all the newest releases, including replication with MySQL 5.7.8 which suffers from a compatibility bug (better explored in a separate article).

To make like easier for testers of newest versions, all replication deployments through MySQL Sandbox now include a test_replication script, which ensures that replication is working correctly. The new release also includes more tarball pattern tests, to check that known name patterns are recognized. In all, MySQL Sandbox has now about 100 tests more than before. Every time I release a new version, I run the suite with 10 or 12 versions of MySQL, Percona Server, MariaDB, for a grand total of about 5,000 tests.

And speaking of tests, there are features that require more attention than just installing a group of sandboxes, and are not easy to incorporate into MySQL Sandbox tools. For this reason, I have published on GitHub the sample scripts that I use to demonstrate multi-source replication for MySQL 5.7 and MariaDB 10. Since I was at it, I have also published the examples used for Pivot tables demos.

Tuesday, March 10, 2015

MySQL 5.7.6 is out. Be prepared for big changes

Today Oracle released MySQL 5.7.6 milestone 16. With this, MySQL 5.7 has been in development for over 2 years.
Compared to MySQL 5.6, the changes are quite extensive. The main effort of the team has been focused on speed, with performance reportedly improved from 2 to 3 times compared to previous releases.
A full list of what is new would take too much space here, but I would like to mention some key points:

Oracle has spent a considerable amount of energy in the improvement of MySQL security and safety. You will see many new features, but even more old features that were deprecated and more that were removed after deprecation in 5.6.
The installation process has been changing in every milestone release of MySQL 5.7, always towards the goal of making MySQL more robust. The effort, though, will break compatibility with installation practices used for previous versions.

In this article, I will cover the biggest changes during installation.
In MySQL 5.6, mysql_install_db has an option for the generation of a random password during database creation. The process is unfriendly to scripting but it is a step in the direction of ending the long time practice of creating a root user without a password.
In MySQL 5.7.4, there was a further change, with the random password generation becoming the default, with the possibility of skipping the creation with the option –skip-random-password.
In MySQL 5.7.5, the default was confirmed, but the option was changed to –insecure.

And now, in MySQL 5.7.6, the crackdown on old practices continues: mysql_install_db is deprecated, and replaced with mysqld –initialize (formerly known as “mysqld –bootstrap,” now deprecated.)
Here’s a test run:

$ ~/opt/mysql/5.7.6/bin/mysqld --no-defaults --basedir=~/opt/mysql/5.7.6/ --datadir=$PWD/data --initialize
2015-03-09T05:06:37.159659Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-03-09T05:06:37.355155Z 0 [Warning] InnoDB: New log files created, LSN=45790
2015-03-09T05:06:37.410118Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2015-03-09T05:06:37.467002Z 0 [Warning] Failed to setup SSL
2015-03-09T05:06:37.467029Z 0 [Warning] SSL error: SSL context is not usable without certificate and private key
2015-03-09T05:06:37.468142Z 1 [Warning] A temporary password is generated for root@localhost: f<jqhdJ(A5p#

Compared to the previous versions, the most notable difference is that there is no .mysql_secret file, but a simple line on the screen mentioning the temporary password.
But there is one, more important behavioral difference: this command works only once. When using mysql_install_db, you could run the same command even if the data directory existed, where the script would re-apply the data creation commands. Using mysqld –initialize, you can only run on a non-existing data directory.

$ ~/opt/mysql/5.7.6/bin/mysqld --no-defaults --basedir=~/opt/mysql/5.7.6/ --datadir=$PWD/data --initialize
2015-03-09T05:49:12.504413Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-03-09T05:49:12.505398Z 0 [ERROR] --initialize specified but the data directory exists. Aborting.
2015-03-09T05:49:12.505422Z 0 [ERROR] Aborting

Using the newly created database is a bit more trickier than before:

$ ~/opt/mysql/5.7.6/bin/mysql --no-defaults   -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.6-m16

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use mysql
ERROR 1820 (HY000): You must SET PASSWORD before executing this statement
mysql> set password=password('test');
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'password('test')' at line 1

Uh? What’s this? This command used to work until recently. The reason is a change in the syntax of SET PASSWORD, which now accepts a plain text argument:

mysql> set password='test';
Query OK, 0 rows affected (0.00 sec)

The old syntax was meant to be only deprecated, but it was accidentally completely removed. This will be hopefully fixed in MySQL 5.7.7.

More changes involve the use of GRANT, REVOKE, CREATE USER, and ALTER USER, which now are more strict. You get warnings if you try to create users with the GRANT command, or when mixing granting of privileges with authentication options.

mysql> grant all on test.* to testuser identified by 'test';
Query OK, 0 rows affected, 1 warning (0.00 sec)

Warning (Code 1287): Using GRANT for creating new user is deprecated and will be removed in future release. Create new user with CREATE USER statement.

mysql> grant all on *.* to testuser identified by 'test';

Query OK, 0 rows affected, 1 warning (0.00 sec)

Warning (Code 1287): Using GRANT statement to modify existing user's properties other than privileges is deprecated and will be removed in future release. Use ALTER USER statement for this operation.

In short, if you have automated scripts that deal with installing and administering MySQL, you should test them with warnings enabled, and be prepared for broken compatibility with your old practice.

One such ‘old practice’ scripts that is broken by the new syntax changes is MySQL-Sandbox. I have just released an updated version (MySQL Sandbox 3.0.48) with a workaround for MySQL 5.7.6 changed SET PASSWORD syntax.

BTW, did I mention that the password field in the mysql.user table was removed? That’s another surprise that may break quite a lot of existing tests:

mysql> use mysql
Database changed
mysql> select host,user,password from user;
ERROR 1054 (42S22): Unknown column 'password' in 'field list'

mysql> select host, user, authentication_string from user;
+-----------+-------------+-------------------------------------------+
| host      | user        | authentication_string                     |
+-----------+-------------+-------------------------------------------+
| localhost | root        | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | rsandbox    | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |
| %         | testuser    | *94BDCEBE19083CE2A1F959FD02F964C7AF4CFC29 |
+-----------+-------------+-------------------------------------------+
9 rows in set (0.00 sec)

Wednesday, November 25, 2015

Vote for my proposals!

Sunday, November 22, 2015

Saturday, November 21, 2015

Monday, November 16, 2015

Running servers on a regular host

Running servers in a sandbox

Running servers in virtual machines

Running servers in containers

All solutions comparison

MySQL deployment summary

What's next

Wednesday, November 04, 2015

First phase: deploying the containers

Monday, November 02, 2015

Enabling GTID: the dangerous approach.

Sunday, October 25, 2015

Installing Docker

Thursday, October 08, 2015

Monday, October 05, 2015

Monday, September 28, 2015

Tuesday, September 15, 2015

Monday, September 14, 2015

Monday, September 07, 2015

Wednesday, September 02, 2015

Monday, August 31, 2015

Parallel replication overview

Tuesday, August 25, 2015

Monday, August 24, 2015

INSTALLATION

MULTI-SOURCE

Wednesday, August 19, 2015

Introducing star topology.

Monday, August 17, 2015

Introduction to P2P all-masters topology

Friday, August 14, 2015

Introduction: where we stand

Wednesday, August 12, 2015

Crash-safe tables and Global transaction identifiers in MySQL 5.6 and 5.7

Monday, August 10, 2015

Thursday, August 06, 2015

Wednesday, August 05, 2015

Tuesday, August 04, 2015

Monday, August 03, 2015

Thursday, July 30, 2015

Sentinel data: Tap tap… Is this thing on?

Saturday, July 25, 2015

Sunday, July 19, 2015

Tuesday, March 10, 2015

Pages

Giuseppe Maxia

Like my posts? Buy me a book!

Julian's force

Yes, I am one of those

Sapere, saper fare, fare, far sapere

Blog Archive

Planet MySQL