The Data Charmer: 2018

Friday, April 20, 2018

MySQL adjustment bureau

When maintainng any piece of software, we usually deal with two kind of actions:

bug fixing,
new features.

bugs and features

A bug happens when there is an error in the software, which does not behave according to the documentation or the specifications. In short, it's a breech of contract between the software maintainer and the users. The promise, i.e. the software API that was published at every major version, is broken, and the software must be reconciled with the expectations and fixed, so that it behaves again as the documentation says. When we fix a bug in this way, we increment the revision number of the software version (e.g. 1.0.0 to 1.0.1. See semantic versioning).

New features, in turn, can be of two types:

backward compatible enhancements, which add value to the software without breaking the existing functionality. This is the kind of change that requires an increment of the minor indicator in the version (for example: 1.1.15 to 1.2.0.)
Incompatible changes that break the existing behavior and require users to change their workflow. This kind of change requires bumping up the major number in the version (as in 2.1.3 to 3.0.0.)

Not a bug, nor a feature, but an adjustment.

The above concepts seem simple enough: you either fix something that's broken or add new functionality.

However, when maintaining a tool that has the purpose of helping users to deal with another software (as it is the case of dbdeployer that helps users to deploy MySQL databases) there is yet another category of changes that don't fall into the standard categories: it's what happens when the software being helped (MySQL) changes its behavior, which would break the normal functioning of the helping tool, giving the maintainer a difficult choice:

shall I modify the tool's interface to adapt to the new behavior, breaking existing procedures?
or shall I adapt the tool's functioning behind the scenes to keep the interface unchanged?

My philosophy with dbdeployer (and MySQL-Sandbox before it) is to preserve the tool's interface, so that users don't have to change existing procedures. I call this kind of changes adjustments, because they are not bugs, as they are not a consequence of a coding error, and not a feature, as the intervention is not a conscious decision to add new functionality, but an emergency operation to preserve the status quo. You can think of this category as a capricious change in specifications, which so often happens to software developers, with the difference that the one changing the specs is not the user, but a third party who doesn't know, or care, about our goal of preserving the API integrity.

For example, from MySQL 8.0.3 to 8.0.4 there was a change in the default authentication plugin. Instead of mysql_native_password, MySQL 8.0.4 uses caching_sha2_password. The immediate side effect for MySQL-Sandbox and dbdeployer was that replication doesn't work out of the box. A possible solution would be to force the old authentication plugin, but this would not allow users to test the new one. Since the main reason to use a tool like dbdeployer is to experiment with new releases safely, I had to keep the default behavior. Thus, I left the default plugin in place, and changed the way the replication works. It's an ugly workaround actually, but allows users to see the new behavior without losing existing functionality.
To complete the adjustment, I added a new option --native-auth-plugin, which would deploy using the old mysql_native_password. In total, the adjustment consists of a behind-the-scenes change, almost undetectable by users, and a new option to keep using the familiar authentication if users want it.

From the point of view of semantic versioning, this kind of change is a backward-compatible modification of the API, which warrants an increase of the minor number of the version.

Another example: when MySQL went from 8.0.4 to 8.0.11, it introduced a deal breaker change: the X Plugin is now loaded by default. This is easy for users of MySQL as a document store, as they don't need to enable the plugin manually, but bad news for anyone else, as the server is opening a port and a socket that many users may not choose to open voluntarily. What's worse, when installing more sandboxes of version 8.0.11 in the same host (for example in replication), one will succeed in reserving the plugin port and socket, while the others will have the error log populated with surprising errors about a socket being already in use.

The solution is similar to the previous one. When dbdeployer detect MySQL 8.0.11 or newer, it adds options to customize the mysqlx plugin port and socket, thus allowing a frictionless deployment where the new functionality is available to the brave experimenters. At the same time, I added a new option (--disable-mysqlx) for the ones who really don't want an extra port and socket in their servers, not even for testing.

These adjustment are usually costly additions. While the added code is not that much, they require extra tests, which are often complex and require more time to write and execute them. The process to add an adjustment goes mostly like this:

I dedicate my morning walk to think about the fix. Sometimes the fix requires several walks, while I decide the less intrusive solution.
If the walk has been fruitful, writing the code requires just a few minutes. If I missed something, I iterate.
Then the more difficult part: writing meaningful tests that prove that the adjustment is correct and it doesn't introduce side effects in any MySQL version. And of course the option that reintroduces the old behavior must be tested too.
A positive side effect of this exercise is that often I realize that I was missing a test for an important behavior and then I write down that as well. The test suite included 6,000+ tests 1 month ago, and now it has almost doubled.

Tuesday, April 03, 2018

Test MySQL 8.0 right in your computer

MySQL 8.0 GA is right around the corner. I don't have precise information about its release, as I don't work at Oracle. If I did, I would probably know, but I couldn't tell when the release is scheduled to appear because of company policies. I can, however, speculate and infer, based of my experience with previous releases. My personal assessment is that the release will appear before 9:00am PT on April 24, 2018. The "before" can be anything from a few minutes to one week in advance.
Then, again, it may not happen at all if someone finds an atrocious bug that needs to be fixed asap.

Either way, users are keen on testing the new release in its current state of release candidate. Here I show a few methods that allow you to have a taste of the new goodies without waiting for the triumphal (keynote) announcement.

1. Docker containers

If you are a docker user, using a container to test MySQL is a no brainer. Unlike virtual machines or standalone servers, a docker container comes ready to use, with nothing to configure. All you need to do is pulling the right image. As with every docker images, you pull once and then use as many times as you need.

There are two reliable images that contain the latest MySQL. One is called mysql:8.0 and is tagged as official, which means that it is released by the Docker maintenance team. The other one, which is released by the MySQL team, is called mysql/mysql-server:8.0.

$ docker pull mysql:8.0
8.0: Pulling from library/mysql
Digest: sha256:7004063f8bd0c7bade8d1c526b9b8f5188c8288f411d76ee4ba83131e00c6f02
Status: Downloaded newer image for mysql:8.0

$ docker pull mysql/mysql-server:8.0
8.0: Pulling from mysql/mysql-server
Digest: sha256:e81d95f788adb04a4d2fa5f6f7e9283ca0f6360fb518efe65af5a7377a4ec282
Status: Downloaded newer image for mysql/mysql-server:8.0

The mysql image is based on Debian, while the original package, as you would expect, is based on Oracle Linux.

Let's see how to run MySQL in a container.

$ docker run --name official  -e MYSQL_ROOT_PASSWORD=secret -d mysql:8.0
60ec307578a139f5083ded07e94d737690d287b1b95093878675983a5cc40174

$ docker run --name original -e MYSQL_ROOT_PASSWORD=secret \
    -d mysql/mysql-server:8.0
0c93bb4a97ffa53232a69732d3ae45413a443e38fa43ad6fdc4057168cba42d2

With the above commands we get two containers, one for the official image and one for the original one.
We can't use them straight away, though. We need to wait for the servers to be ready. An easy method to verify the status of the server is looking at docker logs:

$ docker logs original --tail 1
2018-04-01T21:23:30.395461Z 0 [System] [MY-010931] /usr/sbin/mysqld: ready for connections. Version: '8.0.4-rc-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL).

$ docker logs original --tail 1
2018-04-01T21:23:30.395461Z 0 [System] [MY-010931] /usr/sbin/mysqld: ready for connections. Version: '8.0.4-rc-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL).

Here, after about 10 seconds, both containers are ready to use. We can now access the servers. One easy method is through docker exec

$ docker exec -ti original mysql -psecret
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 8.0.4-rc-log MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

A similar command would allow us to access the other container.

If you want to try replication, more work is needed. In these articles you will find more details on Docker operations, and examples of advanced deployments:

2. Sandboxes

A sandboxed database is deployed in a non-dedicated box, with its configuration altered in such a way that it will run independently from other similar deployment and even from databases running in the main space.
The granddaddy of the sandbox deployer was MySQL-Sandbox, which has recently evolved into the more powerful and easier to use dbdeployer.
You can use MySQL-Sandbox to test a MySQL 8.0 tarball on MacOS

$ make_sandbox --export_binaries  mysql-8.0.4-rc-macos10.13-x86_64.tar.gz

This command unpacks the tarball into $HOME/opt/mysql and deploys the database in $HOME/sandboxes/msb_8_0_4.
Until recently, the same command would work on Linux without modifications. In MySQL 8.0.4, though, the tarball organization for Linux has changed. There are symbolic links for SSL libraries inside the ./bin directory. Those symlinks are not extracted by default, but only if you use the option --keep-directory-symlink when opening the tarball. MySQL-Sandbox doesn't do it, also because this option is not standard to every version of tar.

Thus, if you want to use the old MySQL-Sandbox, you need to run the extraction manually.

$ cd $HOME/opt/mysql
$ tar -xzf  --keep-directory-symlink /tmp/mysql-8.0.4-rc-linux-glibc2.12-x86_64.tar.gz
$ mv mysql-8.0.4-rc-linux-glibc2.12-x86_64 8.0.4
$ make_sandbox 8.0.4

I don't recommend the above procedure, for either Linux or MacOS. The main reason, in addition to the manual operations involved, is that MySQL-Sandbox is not going to be updated for the time being. Instead, you should use dbdeployer, which has all the main features of MySQL-Sandbox and a lot of new ones. Here's the equivalent procedure:

$ dbdeployer unpack /tmp/mysql-8.0.4-rc-linux-glibc2.12-x86_64.tar.gz
$ dbdeployer deploy single 8.0.4
Database installed in $HOME/sandboxes/msb_8_0_4
run 'dbdeployer usage single' for basic instructions'
. sandbox server started

dbdeployer uses a different method to initialize the database server, which at the same time makes the initialization more visible and avoids the problem of the phantom SSL libraries.

Note: Tarballs for recent MySQL versions are really big. MySQL 8.0.4 binaries expand to 1.9 GB. If storage is an issue, you should get the tarballs from a collection of minimised tarballs (Linux only) for most MySQL versions. For now, it's maintained by me, but I hope that the the MySQL team will release something similar.

Once you have deployed a sandbox with MySQL 8.0, using it is easy:

$ cd $HOME/sandboxes/msb_8_0_4
$ ./use
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.4-rc-log MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost] {msandbox} ((none)) >

dbdeployer creates several shortcuts for the most common commands to use the database. ./use is the most common, and provides access to the MySQL client with all the necessary options needed to use it correctly. For more information on what is available, run

$ dbdeployer usage single

This functionality would be enough to decide for a sandbox as your preferred method for testing. However, it this is only a tiny portion of what you can do with dbdeployer in your own computer. With a single command, you can test master/slave replication, multi-primary group replication, single primary group replication, fan-in, and all-masters topologies.

You can try the following commands:

$ dbdeployer deploy single 8.0.4
$ dbdeployer deploy replication 8.0.4
$ dbdeployer deploy replication 8.0.4 --topology=group
$ dbdeployer deploy replication 8.0.4 --topology=group --single-primary
$ dbdeployer deploy replication 8.0.4 --topology=all-masters
$ dbdeployer deploy replication 8.0.4 --topology=fan-in

If you have enough RAM, all these deployments will survive in parallel.
In my desktop, I can run:

$ dbdeployer sandboxes --header
name                        type                    version  ports
----------------            -------                 -------  -----
all_masters_msb_8_0_4     : all-masters               8.0.4 [15001 15002 15003]
fan_in_msb_8_0_4          : fan-in                    8.0.4 [14001 14002 14003]
group_msb_8_0_4           : group-multi-primary       8.0.4 [20009 20134 20010 20135 20011 20136]
group_sp_msb_8_0_4        : group-single-primary      8.0.4 [21405 21530 21406 21531 21407 21532]
msb_8_0_4                 : single                    8.0.4 [8004]
rsandbox_8_0_4            : master-slave              8.0.4 [19009 19010 19011]

When MySQL 8.0.11 is released, you can replace "8.0.4" with "8.0.11" and get a similar result.

BTW, you have seen that deploying replication sandboxes may take a long time. You may try adding --concurrent to each command, and enjoy a notable speed increase.

What else can you do with the sandboxes you have just deployed? Plenty! For a complete list, have a look at the online documentation. But for the moment, you may try this:

$ dbdeployer global status
$ dbdeployer global test
$ dbdeployer global test-replication

3. Other methods

Besides the methods that I recommend, there are others that you could use, but I won't advise about them as there are more qualified ones for that.

Standalone server. If you have the luxury of having one or more standalone servers sitting in a lab, by all means go for it. Just follow the instructions about installing MySQL on your lucky server. Be advised, though, that depending on the method you choose and the version of your operating system, you may face compatibility issues (.rpm or .deb dependencies).
Virtual machines. VMs share with standalone servers the same ease of installation (and the same dependency issues), only a bit slower. They are convenient, as you can use them to test in conditions that more closely resemble production settings, and if you use a configuration server such as Puppet or Ansible, your task of testing the new version could be greatly simplified. The instructions for the virtual machines are the same seen for standalone servers.

Monday, April 02, 2018

dbdeployer GA and semantic versioning

dbdeployer went into release candidate status a few weeks ago. Since then, I added no new features, but a lot of tests. The test suite now runs 3,000+ tests on MacOS and a bit more on Linux, for a grand total of 6,000+ tests that need to run at least twice: once with concurrency enabled and once without. I know that testing can't prove the absence of bugs, but I am satisfied with the results, since all this grinding has allowed me to find several bugs and fix them.

In this framework, I felt that dbdeployer could exit candidate status and get to version 1.0. This happened on March 26th. An immediate side effect of this change is that from this point on, dbdeployer must adhere to the semantic versioning principles:

A version number is made of Major, Minor, and Revision. When changes are applied, the following happens:

Backward-compatible bug fixes increment the Revision number (e.g. 1.0.0 to 1.0.1)
Backward-compatible new features increment the Minor number (1.0.1 to 1.1.0)
Backward incompatible changes (either features or bug fixes that break compatibility with the API) increment the Major number (1.15.9 to 2.0.0)

The starting API is defined in API-1.0.md, which was generated manually.
The file API-1.1.md contains the same API definition, but was generated automatically and can be used to better compare the initial API with further version.

So the app went from 1.0 to 1.1 in less than one week. In obedience to semantic versioning principles, if a new backward-compatible feature is added, the minor number of the version increases. What does backward-compatible mean? It means that commands, procedures, and workflows that were working with the previous version will also work with the current one. It's just that the new release will have more capabilities. In this case, the added feature is the ability of having environment variables HOME and PWD recognized and properly expanded in the configuration file. It's nothing very exciting, but changing the minor number gives the user a hint of what to expect from the new release.

Let's give a few examples:

Version goes from 1.0.0 to 1.0.1: It means that there are only bug fixes, and you should expect to use it without modifications.
Version goes from 1.0.1 to 1.1.0: You should be able to use dbdeployer just as before, but you should check the release notes to see what's new, because there are new functionalities that might be useful to you.
Version goes from 1.3.15 to 2.0.0: Danger! A major number bumped up means that something has changed in the API, which is now partially or totally incompatible with the previous release. Your workflow may break, and you must check the release notes and the documentation to learn how to use the new version.

This is different from other applications. For example, the MySQL server uses version numbers with hard to predict meaning:

MySQL 5.1, 5.5, 5.6, and 5.7 should be, in fact, major version number changes, not minor ones. Each one of them introduces incompatible changes that require careful review of the novelties.
Within the same version (such as MySQL 5.7) there are a lot of compatible and incompatible changes, although the minor number stays the same.

The plan with dbdeployer is to use the version number as a manifest, to give users an immediate feeling of what to expect. Rather than changing minor or major number only when the developers think there is some juicy new thing of which they can be proud, the version number will tell whether users should worry about compatibility or not.

In my general development plan, you are more likely to see versions like "1.25.16" than version "2.0," meaning that I will try to keep the current API valid as much as possible. A major version change will signify that a new feature could not fit in the current infrastructure and a new one would be needed.

You can draw your own conclusions here. A semantic versioning paradigm is unlikely to be adopted by most software vendors, because version numbers are often marketing gimmicks, and they can charge you more convincingly for a version 6.0 than for version 1.34.
Free software, OTOH, can do this. My goal with dbdeployer is to help the MySQL community, and I will achieve that goal more easily if my releases can be adopted without fear of incompatibility.

Monday, March 12, 2018

dbdeployer release candidate

The latest release of dbdeployer is possibly the last one with a leading 0. If no serious bugs are found in the next two weeks, the next release will bear a glorious 1.0.

Latest news

The decision to get out of the stream of pre-releases that were published until now comes because I have implemented all the features that I wanted to add: mainly, all the ones that I wished to add to MySQL-Sandbox but it would have been too hard:

concurrent deployment and deletion
group replication;
multi-source replication;
Full customisation
Centralised administration.
Container-based tests;
Mock tests that can run without any real database server.

The latest addition is the ability of running multi-source topologies. Now we can run four topologies:

master-slave is the default topology. It will install one master and two slaves. More slaves can be added with the option --nodes.
group will deploy three peer nodes in group replication. If you want to use a single primary deployment, add the option --single-primary. Available for MySQL 5.7 and later.
fan-in is the opposite of master-slave. Here we have one slave and several masters. This topology requires MySQL 5.7 or higher.
all-masters is a special case of fan-in, where all nodes are masters and are also slaves of all nodes.

It is possible to tune the flow of data in multi-source topologies. The default for fan-in is three nodes, where 1 and 2 are masters, and 2 are slaves. You can change the predefined settings by providing the list of components:

$ dbdeployer deploy replication \
    --topology=fan-in \
    --nodes=5 \
    --master-list="1 2 3" \
    --slave-list="4 5" \
    8.0.4 \
    --concurrent

In the above example, we get 5 nodes instead of 3. The first three are master (--master-list="1 2 3") and the last two are slaves (--slave-list="4 5") which will receive data from all the masters. There is a test automatically generated to test replication flow. In our case it shows the following:

$ ~/sandboxes/fan_in_msb_8_0_4/test_replication
# master 1
# master 2
# master 3
# slave 4
ok - '3' == '3' - Slaves received tables from all masters
# slave 5
ok - '3' == '3' - Slaves received tables from all masters
# pass: 2
# fail: 0

The first three lines show that each master has done something. In our case, each master has created a different table. Slaves in nodes 5 and 6 then count how many tables they found, and if they got the tables from all masters, the test succeeds.
Note that for all-masters topology there is no need to specify master-list or slave-list. In fact, those lists will be auto-generated, and they will both include all deployed nodes.

What now?

Once I make sure that the current features are reasonably safe (I will only write more tests for the next 10~15 days) I will publish the first (non-pre) release of dbdeployer. From that moment, I'd like to follow the recommendations of the Semantic Versioning:

The initial version will be 1.0.0 (major, minor, revision);
The spects for 1.0 will be the API that needs to be maintained.
Bug fixes will increment the revision counter.
New features that don't break compatibility with the API will increment the minor counter;
New features or changes that break compatibility will trigger a major counter increment.

Using this method will give users a better idea of what to expect. If we get a revision number increase, it is only bug fixes. An increase in the minor counter means that there are new features, but all previous features work as before. An increase in the major counter means that something will break, either because of changed interface or because of changed behavior.
In practice, the tests released with 1.0.0 should run with any 1.x subsequent version. When those tests need changes to run correctly, we will need to bump up the major version.

Let's see if this method is sustainable. So far, I haven't had need to do behavioural changes, which are usually provoked by new versions of MySQL that introduce incompatible behavior (definitely MySQL does not follow the Semantic Versioning principles.) When the next version becomes available, I will see if this RC of dbdeployer can stand its ground.

Sunday, March 11, 2018

Concurrent sandbox deployment

Version 0.3.0 of dbdeployer has gained the ability of deploying multiple sandboxes concurrently. Whenever we deploy a group of sandboxes (replication, multiple) we can use the --concurrent flag, telling dbdeployer that it should run operations concurrently.

What happens when a single sandbox gets deployed? There are six sets of operations:

Create the sandbox directory and write down its scripts;
Run the initialisation script;
Start the database server;
Run the pre-grants SQL commands (if any;)
Load the grants;
Run the post-grants SQL commands (if any;)

When several sandboxes are deployed concurrently, dbdeployer runs only the first step, and then creates a list of commands with an associated priority index. These commands are assembled for every sandbox, and then executed concurrently for every step.
The sequence of events for a deployment of three sandboxes in replication would be like this:

Create the sandbox skeleton for every sandbox;
Initialise all database servers;
start all the servers;
run the pre-grants, grants, post-grants scripts.
Runs the group initialisation script (start master and slaves, or setup group replication).

Depending on the computer architecture, the server version, and the number of nodes, the speed of deployment can increase from 2 to 5 times.

Let's see an example:

$ time dbdeployer deploy replication 5.7.21
[...]
real    0m13.789s
user    0m1.143s
sys 0m1.873s

$ time dbdeployer deploy replication 5.7.21 --concurrent
[...]
real    0m7.780s
user    0m1.329s
sys 0m1.811s

There is a significant speed increase. The gain rises sharply if we use an higher number of nodes.

$ time dbdeployer deploy replication 5.7.21 --nodes=5
[...]
real    0m23.425s
user    0m1.923s
sys 0m3.106s

$ time dbdeployer deploy replication 5.7.21 \
    --nodes=5 --concurrent
[...]
real    0m7.686s
user    0m2.248s
sys 0m2.777s

As we can see, the time for deploying 5 nodes is roughly the same used for 3 nodes. While the sequential operations take time proportionally with the number of nodes, the concurrent task stays almost constant.

Things a re a bit different for group replication, as the group initialisation (which happens after all the servers are up and running) takes more time than the simple master/slave deployment, and can't be easily reduced using the current code.

A similar optimisation happens when we delete multiple sandboxes. Here the operation is at sandbox level (1 replication cluster = 1 sandbox) not at server level, and for that reason the gain is less sharp. Still, operations are noticeably faster.

There is room for improvement, but I have seen that the total testing time for dbdeployer test suite has dropped from 26 to 15 minutes. I think it was a week end well spent.

Monday, March 05, 2018

Customizing dbdeployer

As of version 0.2.1, dbdeployer allows users to customize composite sandboxes more than ever. This is done by manipulating the default settings, which are used to deploy the sandbox templates.

In order to appreciate the customization capabilities, let's start with a vanilla deployment, and then we have a look at the possible changes.

$ dbdeployer deploy replication 8.0.4
Installing and starting master
Database installed in $HOME/sandboxes/rsandbox_8_0_4/master
. sandbox server started
Installing and starting slave 1
Database installed in $HOME/sandboxes/rsandbox_8_0_4/node1
. sandbox server started
Installing and starting slave 2
Database installed in $HOME/sandboxes/rsandbox_8_0_4/node2
. sandbox server started
$HOME/sandboxes/rsandbox_8_0_4/initialize_slaves
initializing slave 1
initializing slave 2
Replication directory installed in $HOME/sandboxes/rsandbox_8_0_4
run 'dbdeployer usage multiple' for basic instructions'

A regular replication sandbox has one master and two slaves. Each slave is inside a directory called nodeX.

The resulting sandbox has a directory called master, two nodeX directories, a shortcut for the master called m, and two shortcuts for the slaves called s1 and s2. There are also two management scripts called initialize_slaves and check_slaves.

    $ ls -l ~/sandboxes/rsandbox_8_0_4/
    total 152
    -rwxr--r--   1 user  staff  1500 Mar  5 06:21 check_slaves
    -rwxr--r--   1 user  staff  1160 Mar  5 06:21 clear_all
    -rwxr--r--   1 user  staff  1617 Mar  5 06:21 initialize_slaves
    -rwxr--r--   1 user  staff   806 Mar  5 06:21 m
    drwxr-xr-x  22 user  staff   748 Mar  5 06:21 master
    -rwxr--r--   1 user  staff   806 Mar  5 06:21 n1
    -rwxr--r--   1 user  staff   804 Mar  5 06:21 n2
    -rwxr--r--   1 user  staff   804 Mar  5 06:21 n3
    drwxr-xr-x  23 user  staff   782 Mar  5 06:21 node1
    drwxr-xr-x  23 user  staff   782 Mar  5 06:21 node2
    -rwxr--r--   1 user  staff   855 Mar  5 06:21 restart_all
    -rwxr--r--   1 user  staff   804 Mar  5 06:21 s1
    -rwxr--r--   1 user  staff   804 Mar  5 06:21 s2
    -rw-r--r--   1 user  staff   173 Mar  5 06:21 sbdescription.json
    -rwxr--r--   1 user  staff  1127 Mar  5 06:21 send_kill_all
    -rwxr--r--   1 user  staff  1296 Mar  5 06:21 start_all
    -rwxr--r--   1 user  staff  1680 Mar  5 06:21 status_all
    -rwxr--r--   1 user  staff  1087 Mar  5 06:21 stop_all
    -rwxr--r--   1 user  staff  4598 Mar  5 06:21 test_replication
    -rwxr--r--   1 user  staff  1315 Mar  5 06:21 test_sb_all
    -rwxr--r--   1 user  staff  1100 Mar  5 06:21 use_all

Now, let's see how we can change this. We'll start by listing the current defaults

$ dbdeployer defaults show
# Internal values:
{
    "version": "0.2.1",
    "sandbox-home": "$HOME/sandboxes",
    "sandbox-binary": "$HOME/opt/mysql",
    "master-slave-base-port": 11000,
    "group-replication-base-port": 12000,
    "group-replication-sp-base-port": 13000,
    "fan-in-replication-base-port": 14000,
    "all-masters-replication-base-port": 15000,
    "multiple-base-port": 16000,
    "group-port-delta": 125,
    "master-name": "master",
    "master-abbr": "m",
    "node-prefix": "node",
    "slave-prefix": "slave",
    "slave-abbr": "s",
    "sandbox-prefix": "msb_",
    "master-slave-prefix": "rsandbox_",
    "group-prefix": "group_msb_",
    "group-sp-prefix": "group_sp_msb_",
    "multiple-prefix": "multi_msb_",
    "fan-in-prefix": "fan_in_msb_",
    "all-masters-prefix": "all_masters_msb_"
 }

The values that we want to change are master-name, master-abbr, node-prefix, slave-prefix, and slave-abbr. We can export the defaults to a file, and import them after editing the values we want to change.

$ dbdeployer defaults export defaults.json
# Defaults exported to file defaults.json
$ vim defaults.json
$ dbdeployer defaults import defaults.json
Defaults imported from defaults.json into $HOME/.dbdeployer/config.json

Now dbdeployer is using the new defaults.

$ dbdeployer defaults show
# Configuration file: $HOME/.dbdeployer/config.json
{
    "version": "0.2.1",
    "sandbox-home": "/Users/gmax/sandboxes",
    "sandbox-binary": "/Users/gmax/opt/mysql",
    "master-slave-base-port": 11000,
    "group-replication-base-port": 12000,
    "group-replication-sp-base-port": 13000,
    "fan-in-replication-base-port": 14000,
    "all-masters-replication-base-port": 15000,
    "multiple-base-port": 16000,
    "group-port-delta": 125,
    "master-name": "primary",
    "master-abbr": "p",
    "node-prefix": "branch",
    "slave-prefix": "replica",
    "slave-abbr": "r",
    "sandbox-prefix": "msb_",
    "master-slave-prefix": "rsandbox_",
    "group-prefix": "group_msb_",
    "group-sp-prefix": "group_sp_msb_",
    "multiple-prefix": "multi_msb_",
    "fan-in-prefix": "fan_in_msb_",
    "all-masters-prefix": "all_masters_msb_"
 }

We have now *primary* for *master*, *replica* for *slave*, *branch* for *node*, and the abbreviations for master and slave changed to *p* and *r* respectively.
Let's see how these defaults can play together when we run the same command as we did before for replication. We first remove the previous deployment.


$ dbdeployer delete rsandbox_8_0_4
List of deployed sandboxes:
$HOME/sandboxes/rsandbox_8_0_4
Running $HOME/sandboxes/rsandbox_8_0_4/stop_all
# executing "stop" on $HOME/sandboxes/rsandbox_8_0_4
executing "stop" on slave 1
executing "stop" on slave 2
executing "stop" on master
Running rm -rf $HOME/sandboxes/rsandbox_8_0_4
Sandbox $HOME/sandboxes/rsandbox_8_0_4 deleted

The deployment command is the same as before, but the output changes:

$ dbdeployer deploy replication 8.0.4
Installing and starting primary
Database installed in $HOME/sandboxes/rsandbox_8_0_4/primary
. sandbox server started
Installing and starting replica 1
Database installed in $HOME/sandboxes/rsandbox_8_0_4/branch1
. sandbox server started
Installing and starting replica 2
Database installed in $HOME/sandboxes/rsandbox_8_0_4/branch2
.. sandbox server started
$HOME/sandboxes/rsandbox_8_0_4/initialize_replicas
initializing replica 1
initializing replica 2
Replication directory installed in $HOME/sandboxes/rsandbox_8_0_4
run 'dbdeployer usage multiple' for basic instructions'

This looks already as if our defaults have been adopted. Let's see the sandbox itself:

$ ls -l ~/sandboxes/rsandbox_8_0_4/
total 152
drwxr-xr-x  23 user  staff   782 Mar  5 06:45 branch1
drwxr-xr-x  23 user  staff   782 Mar  5 06:45 branch2
-rwxr--r--   1 user  staff  1515 Mar  5 06:45 check_replicas
-rwxr--r--   1 user  staff  1170 Mar  5 06:45 clear_all
-rwxr--r--   1 user  staff  1629 Mar  5 06:45 initialize_replicas
-rwxr--r--   1 user  staff   807 Mar  5 06:45 n1
-rwxr--r--   1 user  staff   806 Mar  5 06:45 n2
-rwxr--r--   1 user  staff   806 Mar  5 06:45 n3
-rwxr--r--   1 user  staff   807 Mar  5 06:45 p
drwxr-xr-x  22 user  staff   748 Mar  5 06:45 primary
-rwxr--r--   1 user  staff   806 Mar  5 06:45 r1
-rwxr--r--   1 user  staff   806 Mar  5 06:45 r2
-rwxr--r--   1 user  staff   855 Mar  5 06:45 restart_all
-rw-r--r--   1 user  staff   173 Mar  5 06:45 sbdescription.json
-rwxr--r--   1 user  staff  1137 Mar  5 06:45 send_kill_all
-rwxr--r--   1 user  staff  1308 Mar  5 06:45 start_all
-rwxr--r--   1 user  staff  1700 Mar  5 06:45 status_all
-rwxr--r--   1 user  staff  1097 Mar  5 06:45 stop_all
-rwxr--r--   1 user  staff  4613 Mar  5 06:45 test_replication
-rwxr--r--   1 user  staff  1325 Mar  5 06:45 test_sb_all
-rwxr--r--   1 user  staff  1106 Mar  5 06:45 use_all

We see that the new defaults were used and the script names have changed. But the differences are deeper than this. Also the internal values in the scripts were changed accordingly.

$ ~/sandboxes/rsandbox_8_0_4/test_replication
# primary log: mysql-bin.000001 - Position: 14073 - Rows: 20
# Testing replica #1
ok - replica #1 acknowledged reception of transactions from primary
ok - replica #1 IO thread is running
ok - replica #1 SQL thread is running
ok - Table t1 found on replica #1
ok - Table t1 has 20 rows on #1
# Testing replica #2
ok - replica #2 acknowledged reception of transactions from primary
ok - replica #2 IO thread is running
ok - replica #2 SQL thread is running
ok - Table t1 found on replica #2
ok - Table t1 has 20 rows on #2
# Tests :    10
# failed:     0 (  0.0%)
# PASSED:    10 (100.0%)
# exit code: 0

The test script calls the components with the names that we defined in the new defaults. Let's have a look at what the shortcuts for the master and slaves (now primary and replicas) do:

$ ~/sandboxes/rsandbox_8_0_4/p
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 35
Server version: 8.0.4-rc-log MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

primary [localhost] {msandbox} ((none)) >

$ ~/sandboxes/rsandbox_8_0_4/r1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 8.0.4-rc-log MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

replica1 [localhost] {msandbox} ((none)) >

Also the internal prompt has been adapted to the new naming.

Should we want to revert to the old behavior, we can just reset the defaults:

$ dbdeployer defaults reset
#File $HOME/.dbdeployer/config.json removed

The current replication sandbox is left untouched, but the next one will use the default values.

If we don't want to change the defaults permanently, there is an alternative. The --defaults flag allows us to change defaults on-the-fly just for the command we're running. For example, we could have achieved the same result, without editing the configuration file, using this command:

    dbdeployer deploy replication 8.0.4 \
        --defaults=master-name:primary \
        --defaults=master-abbr:p \
        --defaults=slave-prefix:replica \
        --defaults=slave-abbr:r \
        --defaults=node-prefix:branch

The syntax for --defaults requires the name of the variable and the new value, separated by a colon. The flag can be used as many times as needed.

MySQL security for real users

Security features overview

One of Oracle's tenets is the focus on security. For this reason, when it took over the stewardship of MySQL, it started addressing the most common issues. It was not quick acting, but we have seen real progress:

MySQL 5.7 has removed the anonymous accounts, which was the greatest threat to security. Because of those accounts, and the default privileges granted to them, users without any privileges could access the "test" database and do serious damage. Additionally, because of the way the privilege engine evaluates accounts, anonymous users could hijack legitimate users, by preventing them to work properly.
The "root" account now comes with a password defined during initialization. This is good news for security, but bad news for how the change was implemented.
There is a new way of setting an options file for connection credentials: the mysql_config_editor paired with option --login-path allows users to store encrypted credentials for secure use. Also here, while we should rejoice for the added security, we can't help feeling that the implementation is yet again far from meeting users needs.
There is an useful warning (introduced in MySQL 5.6) when using a password on the command line, telling users that it is a risk. Also in this case, we have a usability issue: while users care about their production deployments and use option files to avoid using passwords on the command line, there are, nonetheless, a lot of testing scripts, used in safe environment or with non-valuable data, where a password in the command line was not an issue, and the new warning simply screws up the result of those carefully crafted tests. This change, which can't be overcome without modifying the MySQL clients code, needs users to change their existing tests to adapt to the new behavior.
MySQL 8 introduces roles, which simplify the accounts management. There are some minor usability issues, although in general the feature meets expectations.

This is the scenario of the main enhancements in MySQL since 5.6. Each one of them has some usability problems, some minor, some really bad.
We will first have a look at the problems mentioned above, and then examine the root cause for why they have arisen.

Usability issues

I start by noticing that some developers in the MySQL team have been working there for many years, starting with the time when MySQL was a different database and was used really differently.
In those times, managing the database meant that a human (the DBA) would run operations manually, take a look at the result, and adjust when needed. And then, when things went wrong, the same human explored the database system to find out what happened, took action, and went back to sleep.

Human-centered management leads to human problems: lazy DBA left their databases without password, using the root account, and exposing the server to uninspired attacks; they used passwords on the command line, without caring for options files (or without knowing about them.) Careless DBAs did not deal with anonymous users, leaving a dangerous backdoor in their server.

Some of the new functionalities introduced in the latest MySQL versions are aimed at this type of users: when you install MySQL, you get a message saying: your root password is ************, and the lazy DBAs have no option but to take note and use it. When they use the password on the command line, the annoying warning forces them to start using an options file or the mysql_config_editor.

This is all good, but the main problem here is that the DBAs of 10 years ago are on the verge of extinction. They are replaced by a new breed of DBAs who are not lazy, because they can't afford to be, and need to use dozens, hundreds, thousands of databases at once, using configuration management tools that don't require manual intervention, and actually abhor it. In the land of automation, some of the MySQL security enhancements are not seen as a solution, but as new problems.

Let's see an interesting example: docker containers.

Using Docker, MySQL images are deployed using a password on the command line. This is done for compatibility with the first implementation of the image maintained by the Docker team, where you deploy with this syntax:

docker run -e MYSQL_ROOT_PASSWORD=secret -d mysql

The MYSQL_ROOT_PASSWORD is a directive that becomes an environment variable inside the container, and the server uses it during initialization. As you can imagine, this is not recommended for a secure environment. Then, what's the MySQL team recommendation? They suggest the same strategy used for manual installation: set a directive MYSQL_RANDOM_ROOT_PASSWORD that results in a random password being generated, then collected by the DBA and used. Alternatively, the directive MYSQL_ONETIME_PASSWORD will force the root user to change the password on first connection.

The above suggestions were designed with the ancient DBA still in mind, while container deployment is even more automated than VMs, and it is based on the principle of immutable objects, i.e. containers that spring up from the cloud ready to run, with no configuration needed, and especially no configuration that requires someone (or some tool) to extract a new password from a log. I proposed a different solution, that would never show passwords on the command line and while it was implemented, but it still feels like a hack to circumvent an inadequate design.

As a result, the implementation inside the MySQL recommended Docker image uses "--initialize-insecure" to start the server. This is an implicit recognition of the bad design of the initialization feature. What was designed to overcome DBA's laziness becomes an obstacle towards automation.

We have a similar problem with mysql_config_editor: the tool will create a safe configuration file with credentials for multiple instances, but the password must be inserted manually. Consequently, this potentially useful feature doesn't get adopted, because it would be too difficult or impossible to automate properly.

We have seen that, of the security features that were introduced lately, only a few can be used safely in an automated environment, and all of them have at least one small usability quirk. I have talked about a confusing issue related to the removal of anonymous users where in their eagerness of removing the vulnerability the MySQL team removed also the "test" database, which was a consequence, not the cause of the problem. And I have recently talked about roles usability where there are still open problems, like the ability of telling roles from users which are apparently not considered a bug by the MySQL team.

All the above considerations led me to ask: how did we get to this point? There is an active community, and feedback is offered often with plenty of detail. How come we have such an abundance of usability issues? Don't the developers spend time with users at conferences to learn what they expect? Don't they read articles and blog posts about how a new feature meets expectations? Don't they talk to customers who have adopted new features? They certainly do. Then, why the usability problems persist?

What follows is my observation and speculation on this matter.

Disconnection between MySQL developers and users community

My experience working with system providers has put me in contact with many users. I have seen that in most cases users are very much protective of their current deployment, because it took them long time to get it right, and they don't upgrade unless they don't have another choice. I've seen users test the newer versions, realize that they would break some of their procedures, and defer the upgrade to better times that never come. I remember last year a user with a not so large set of servers was considering an upgrade to MySQL 5.6, while 5.7 had been GA for two years. The reason was a set of incompatibilities that made the upgrade too difficult.

For companies that deal with thousands of servers, the problem is similar, but exacerbated by the quantity of servers to upgrade and the need to do it without stopping operations. This latest requirement has made some users decide not to use GTID, because it required offline time for a master, and they hadn't had time enough to test the upgrade to MySQL 5.7 that would solve that problem.

For one reason or the other, many companies upgrade only two or three years after a given version became GA. And this is the main problem: until they use it in production, or at least test the version for a projected upgrade, users can't give valuable feedback, the one that is related to usage in production, and when they do, the version for which they provide feedback has been GA for long time, and can't be changed, while the next one is already close to GA, and as such will be untouchable.

The MySQL team gets feedback on a release from a handful of curious users who don't delay testing until the new version is GA, but don't provide the kind of important feedback that get the development team attention, such as deployment in production by large customers. In many cases, large customers are the ones that upgrade several years after GA, and by then their input is difficult to act upon.

We have then a curious situation, where the warnings given by the early software explorers are confirmed years later by the users to which the MySQL team listens more closely, but by then the next version of the server is already locked in a release schedule that nobody wants to alter to fix usability issues.

How can we solve this problem? Simple: listen to early software explorers and try to fix problems before GA.

Thursday, March 01, 2018

Using MySQL 8.0: what to expect

MySQL 8.0 will be GA soon (just my assumption: Oracle doesn't tell me anything about its release plans) and it's time to think about having a look at it.
If this is your first try of MySQL 8, get prepared for several impacting differences from previous versions.

In this article I won't tell you what you can do with MySQL 8: there is plenty of material about this, including in this very blog. I will instead concentrate on differences from previous versions that users need to know if they want to avoid surprises.

Data Directory

Let's start with an observation of the data directory.
After a standard installation, without any additional options, I see the following:

Files that I expected to see

auto.cnf
ib_buffer_pool
ib_logfile0
ib_logfile1
ibdata1
ibtmp1
(dir) mysql
(dir) performance_schema
(dir) sys

These files are also present in 5.7.

Files that are new in 8.0

binlog.000001
binlog.index

log-bin is ON by default. You need to remember this if you are using a MySQL server for a benchmark test that used to run without binary logs.

ca-key.pem
ca.pem
client-cert.pem
client-key.pem
private_key.pem
public_key.pem
server-cert.pem
server-key.pem

Now the MySQL generates all the certificates needed to run connections securely. This will greatly simplify your task when setting up a new instance.

mysql.ibd

This was completely unexpected! The mysql database has now its own tablespace. This is probably due to the new Data Dictionary, which is implemented in InnoDB. You will notice that all the InnoDB tables in MySQL use this tablespace, not only dictionary tables. This will help keeping administrative data separate from operational data in the rest of the server.

undo_001
undo_002

The undo logs have now their own tablespace by default.

Global variables

There are a lot of changes in global variables. Here's the list of what will impact your work when you use MySQL 8.0 for the first time:

character_set_client        utf8mb4
character_set_connection    utf8mb4
character_set_database      utf8mb4
character_set_results       utf8mb4
character_set_server        utf8mb4

All character sets are now utf8mb4. In MySQL 5.7, the default values are a mix of utf8 and latin1.

default_authentication_plugin   caching_sha2_password

This is huge. Using this plugin, passwords are stored in a different way, which guarantees more security, but will probably break several workflows among the users. The bad thing about this change implementation is that this password format contains characters that don't display well on screen, and you can see garbled output when inspecting the "user" table.

local_infile    OFF

Loading local files is now prevented by default. If you have a workflow that requires such operations, you need to enable it.

log_bin ON
log_slave_updates   ON

We've seen from an inspection of the local directory that binary logging is enabled by default. But also very important is that log_slave_update is enabled. This is important to have slaves ready to replace a master, but will severely affect performance in those scenarios where some slaves were supposed to run without that feature.

master_info_repository  TABLE
relay_log_info_repository   TABLE

Also impacting performance is the setting for replication repositories, which are now on TABLE by default. This is something that should have happened already in MySQL 5.6 and was long overdue.

Surprisingly, something that DOES NOT get enabled by default is Global Transaction Identifiers (GTID). This is also a legacy from decisions taken in MySQL 5.6. Due to the GTID implementation, enabling them by default is not possible when upgrading from a previous version. With new data in a fresh installation, it is safe to enable GTID from the start.

Users

There are two new users when the server is created:

mysql.infoschema
mysql.session

Theoretically, mysql.session also exists in 5.7, but it was introduced long after GA, so it still qualifies as a novelty.

Then, when the server starts, you get a grand total of 4 users (root and mysql.sys are inherited from MySQL 5.7.)

Mixed oddities

When MySQL initializes, i.e. when the server starts for the first time and creates the database, you will notice some slowness, compared to previous versions. This is in part due to the data dictionary, which needs to create and fill 30 tables, but it is not a big deal in terms of performance. In some systems, though, the slowness is so acute that you start worrying about the server being stuck.

I noticed this problem in my Intel NUC running with SSD storage. In this box, the initialization time took a serious hit:

Version	time
5.0.96	1.231s
5.1.72	1.346s
5.5.52	2.441s
5.6.39	5.540s
5.7.21	6.080s
8.0.3	7.826s
8.0.4	38.547s

There is no mistype. The initialization for 8.0.4 lasts 6 times more than 5.7.
This doesn't happen everywhere. On a Mac laptop running on SSD the same operation takes almost 9 seconds, while 5.7 deploys in less than 5. It is still a substantial difference, one that has totally disrupted my regular operations in the NUC. I investigated the matter, and I found the reason. In 8.0, we have a new (hidden) table in the data dictionary, called st_spatial_reference_systems. Up to MySQL 8.0.3, this table was filled using a single transaction containing roughly 5,000 REPLACE INTO statements. It is a lot of data, but it happens quickly. For comparison, in MySQL 8.0.3 the initialization is only 2 seconds slower than 5.7.
The reason for the slowness in 8.0.4 is that there was a new command added to the syntax: CREATE SPATIAL REFERENCE SYSTEM, which is now used 5,000 times to fill the table that was previously filled with a single transaction. I don't know why someone in the MySQL team thought that changing this operation that is hidden from users was a good idea. The data is contained in the server itself and it goes into a data dictionary table, also not visible to users. I am sure I can find at least two methods to load the data faster. I was told that this glitch will be fixed in the next release. I'm waiting.

Speaking of initialization, the mysql_install_db script has been removed for good in 8.0. If you are still using it instead of the recommended mysqld --initialize, you should adapt asap.

This list is far from being exhaustive. I recommend reading What's new in MySQL 8 before upgrading.
If you are impatient, dbdeployer can help you test MySQL 8 quickly and safely.

Wednesday, February 28, 2018

The confusing strategy for MySQL shell

Where the hell is it?

The MySQL shell is a potentially useful tool that has been intentionally made difficult to use properly.

It was introduced, with much fanfare, with the MySQL Document Store, as THE tool to bridge the SQL and no-SQL worlds. The release was less than satisfactory, though: MySQL 5.7.12 introduced a new feature (the X-protocol plugin) bundled with the server. The maturity of the plugin was unclear, as it popped out of the unknown into a GA release, without any public testing. It was allegedly GA quality, although the quantity of bug reports that were filed soon after the release proved otherwise. The maturity of the shell was known as "development preview", and so we had a supposedly GA feature that could only be used with an alpha quality tool.

The situation with the MySQL shell got worse in a few months. A new product was brewing (MySQL Group Replication) and went rapidly from something released in the Labs without docs to being part of the regular server distribution, and it was evolving into a more complex and ambitious project (the InnoDB Cluster) which used the MySQL shell as its main tool.

Since the announcement of InnoDB Cluster, using the MySQL shell has been a nightmare. You saw examples in blog posts and presentations, and when you tried them at home, they did not work. There were different releases of MySQL shell with the same version number but different capabilities, depending on whether they were released through the main downloads site or through the labs.

When I asked why the shell wasn't distributed with the server, like the other tools, I was told that a non-GA product could not be released with a GA server. Considering that the Document Store is still walking around with a Pre-Production status legal notice, this was an odd excuse.

Still, I kept waiting, trying to figure out how to pair a given version of MySQL shell with a given version of the server. Unlike the server, there are no release notes for the shell, so every release was a surprising experience.

Eventually, the MySQL shell reached the GA state, with which merit I can't tell. Given the obstacles in the path to its usage, I doubt it has had any serious testing from the community. Despite the state being GA, it keeps being released separately, leaving the puzzled users with the ungrateful task of determining with which server version that shell could be used safely.

With the upcoming release of MySQL 8.0, a new version of MySQL shell appeared, with a colorful prompt and new features that the GA shell doesn't have. The public perception of the tool keeps getting more confused. In the presentations given by the MySQL team we see the new shell doing wonders, while the GA shell keeps its monochromatic features. Shall I use the 8.0.x shell with a 5.7 server or should I stick with the 1.0 version?

In MySQL 8.0, the situation is still divided. Both products (the server and the shell) are, as of today, not GA yet. It would make sense to finally end the craziness and put the two things together, so that users don't have to hunt around for the right shell version. But the two products are still released separately.

How can I do stuff with MySQL shell?

So far, we have only seen the availability of the shell. What about the functionality?

I have heard that Oracle wants to convert the shell into the only tool to deal with MySQL. I can't prove it, as Oracle doesn't release its development plans to the public, but I can see the emphasis on the shell in talks and articles authored by MySQL team engineers. If this is the plan, I think it needs a lot more work.

If you try to use MySQL shell the same way as the regular "mysql" client, you get in trouble soon.

mysqlsh --user root --password=msandbox --port=5721 --host 127.0.0.1
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating a Session to 'root@127.0.0.1:5721'
Your MySQL connection id is 38
Server version: 5.7.21 MySQL Community Server (GPL)
No default schema selected; type \use  to set one.
MySQL Shell 1.0.11

Copyright (c) 2016, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help' or '\?' for help; '\quit' to exit.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.

I see two problems here:

The warning about the password on the command line is legitimate. The trouble is that there is no alternative. mysqlsh does not support --defaults-file, and there is no way of giving a password other than directly at invocation. There is an option "--passwords-from-stdin" which does not seem to work, and even if it did, I can't see the advantage of using the password from a pipe.
The default mode is Javascript. I can see that this makes operations simpler when you want to perform setup tasks for InnoDB Cluster, but certainly doesn't help me to use this tool as the primary drive for database management. There is a "--sql" option that does what I expect, but if this is not the default, I can't see this replacement being very successful.
Due to the previous items, using the tool in batch mode (with -e "SQL commands") is impossible, as every invocation will start with the freaking password warning.

I'm afraid that it's too late to take action for MySQL 8.0. The MySQL team is probably packaging the GA release while I write these notes. But I offer some suggestions nonetheless.

Wish list

Package MySQL shell with the server. Past experience shows that the MySQL team keeps adding features into a GA release, thus exposing users to the risk of getting the wrong tool for the job. Having the shell and the server in the same tarball will help users pick the right version for the task. This is similar to what happens with mysqldump: using the tool from 5.5 with a 5.7+ server will not work properly. There is no reason for mysqlsh to be treated differently.
Make sure that all the features of the mysql client work seamlessly in mysqlsh. Perhaps run the test suite replacing mysql with mysqlsh and pick up from there.
Make the MySQL shell compatible with other tools. Specifically, it should support option files (--defaults-file, --defaults-extra-file, --defaults-group-suffix, --no-defaults)

In short, if the plan is to replace mysql with mysqlsh, put the thing in the open, and please make sure it can do what users can reasonably expect.

Wednesday, February 21, 2018

Meet dbdeployer: the new sandbox maker

How it happened

A few years ago I started thinking about refactoring MySQL-Sandbox. I got lots of ideas and a name for the project (dbdeployer) but went no further. The initial idea (this was 2013!) was to rewrite the project in Ruby: I had been using Ruby at work and it looked like a decent replacement for Perl. My main problem was the difficulty of installation in an uncontrolled environment. If you have control over your environment (it's your laptop or you are in charge of the server configuration via Puppet or similar) then the task is easy. But if you ever need to deploy somewhere with little or no notice, it becomes a problem: there are servers where Perl is not installed, and is common that the server also have a policy forbidding all scripting languages from being deployed. Soon I found out that Ruby has the same problem as Perl. In the meantime, my work also required heavy involvement with Python, and I started thinking that maybe it would be a better choice than Ruby.
My adventures with deployment continued. In some places, I would find old versions of Perl, Ruby, Python, and no way of replacing them easily. I also realized that, if I bit the bullet and wrote my tools in C or C++, my distribution problems would not end, as I had to deal with library dependencies and conflict with existing ones.
At the end of 2017 I finally did what I had postponed for so long: I took a serious look at Go, and I decided that it was the best candidate for solving the distribution problem. I had a few adjustment problems, as the Go philosophy is different from my previously used languages, but the advantages were so immediate that I was hooked. Here's what I found compelling:

Shift in responsibility: with all the other languages I have used, the user is responsible for providing the working environment, such as installing libraries, the language itself, solve conflicts, and so on, until the program can work. With Go, the responsibility is on the developers only: they are supposed to know how to collect the necessary packages and produce a sound executable. Users only need to download the executable and run it.
Ease of deployment. A Go executable doesn't have dependencies. Binaries can be compiled for several platforms from a single origin (I can build Linux executables in my Mac and vice versa) and they just work.
Ease of development. Go is a strongly typed language, and has a different approach at code structure than Perl or Python. But this doesn't slow down my coding: it forces me to write better code, resulting in something that is at the same time more robust and easy to extend.
Wealth of packages. Go has an amazingly active community, and there is an enormous amount of packages ready for anything.

What is dbdeployer?

UPDATE 28-Feb-2018: The commands "single", "replication", and "multiple" are now subcommand of "deploy". Also, "templates" is now a subcommand of "defaults".

The first goal of dbdeployer is to replace MySQL-Sandbox completely. As such, it has all the main features of MySQL Sandbox, and many more (See the full list of features at the end of this text.)

You can deploy a single sandbox, or multiple unrelated sandboxes, or several servers in replication. That you could do also with MySQL-Sandbox. The first difference is in the command structure:

$ dbdeployer
dbdeployer makes MySQL server installation an easy task.
Runs single, multiple, and replicated sandboxes.

Usage:
  dbdeployer [command]

Available Commands:
  admin       administrative tasks
  delete      delete an installed sandbox
  global      Runs a given command in every sandbox
  help        Help about any command
  multiple    create multiple sandbox
  replication create replication sandbox
  sandboxes   List installed sandboxes
  single      deploys a single sandbox
  templates   Admin operations on templates
  unpack      unpack a tarball into the binary directory
  usage       Shows usage of installed sandboxes
  versions    List available versions

Flags:
      --base-port int                 Overrides default base-port (for multiple sandboxes)
      --bind-address string           defines the database bind-address  (default "127.0.0.1")
      --config string                 configuration file (default "$HOME/.dbdeployer/config.json")
      --custom-mysqld string          Uses an alternative mysqld (must be in the same directory as regular mysqld)
  -p, --db-password string            database password (default "msandbox")
  -u, --db-user string                database user (default "msandbox")
      --expose-dd-tables              In MySQL 8.0+ shows data dictionary tables
      --force                         If a destination sandbox already exists, it will be overwritten
      --gtid                          enables GTID
  -h, --help                          help for dbdeployer
  -i, --init-options strings          mysqld options to run during initialization
      --keep-auth-plugin              in 8.0.4+, does not change the auth plugin
      --keep-server-uuid              Does not change the server UUID
      --my-cnf-file string            Alternative source file for my.sandbox.cnf
  -c, --my-cnf-options strings        mysqld options to add to my.sandbox.cnf
      --port int                      Overrides default port
      --post-grants-sql strings       SQL queries to run after loading grants
      --post-grants-sql-file string   SQL file to run after loading grants
      --pre-grants-sql strings        SQL queries to run before loading grants
      --pre-grants-sql-file string    SQL file to run before loading grants
      --remote-access string          defines the database access  (default "127.%")
      --rpl-password string           replication password (default "rsandbox")
      --rpl-user string               replication user (default "rsandbox")
      --sandbox-binary string         Binary repository (default "$HOME/opt/mysql")
      --sandbox-directory string      Changes the default sandbox directory
      --sandbox-home string           Sandbox deployment direcory (default "$HOME/sandboxes")
      --skip-load-grants              Does not load the grants
      --use-template strings          [template_name:file_name] Replace existing template with one from file
      --version                       version for dbdeployer

Use "dbdeployer [command] --help" for more information about a command.

MySQL-Sandbox was created in 2006, and its structure changed as needed, without a real plan. dbdeployer, instead, was designed to have a hierarchical command structure, similar to git or docker, to give users a better feeling. As a result, it has a leaner set of commands, a non-awkward way of using options, and offers a better control of the operations out of the box.

For example, here's how we would start to run sandboxes:

$ dbdeployer --unpack-version=8.0.4 unpack mysql-8.0.4-rc-linux-glibc2.12-x86_64.tar.gz
Unpacking tarball mysql-8.0.4-rc-linux-glibc2.12-x86_64.tar.gz to $HOME/opt/mysql/8.0.4
.........100.........200.........292

The first (mandatory) operation is to expand binaries from a tarball. By default, the files will be expanded to $HOME/opt/mysql. Once this is done, we can create sandboxes at will, with simple commands:

$ dbdeployer single 8.0.4
Database installed in $HOME/sandboxes/msb_8_0_4
run 'dbdeployer usage single' for basic instructions'
. sandbox server started

$ dbdeployer replication 8.0.4
[...]
Replication directory installed in /$HOME/sandboxes/rsandbox_8_0_4
run 'dbdeployer usage multiple' for basic instructions'

$ dbdeployer multiple 8.0.4
[...]
Multiple directory installed in $HOME/sandboxes/multi_msb_8_0_4
run 'dbdeployer usage multiple' for basic instructions'

$ dbdeployer sandboxes
msb_8_0_4            : single                    8.0.4 [8004]
multi_msb_8_0_4      : multiple                  8.0.4 [24406 24407 24408]
rsandbox_8_0_4       : master-slave              8.0.4 [19405 19406 19407]

Three differences between dbdeployer and MySQL-Sandbox:

There is only one executable, with different commands;
After each deployment, there is a suggestion on how to get help about the sandbox usage.
There is a command that displays which sandboxes were installed, the kind of deployment, and the ports in use. This will be useful when the ports increase, as in group replication.

Here's another take, after deploying group replication:

$ dbdeployer sandboxes
group_msb_8_0_4      : group-multi-primary   8.0.4 [20405 20530 20406 20531 20407 20532]
group_sp_msb_8_0_4   : group-single-primary  8.0.4 [21405 21530 21406 21531 21407 21532]
msb_8_0_4            : single                8.0.4 [8004]
multi_msb_8_0_4      : multiple              8.0.4 [24406 24407 24408]
rsandbox_8_0_4       : master-slave          8.0.4 [19405 19406 19407]

A few more differences from MySQL-Sandbox are the "global" and "delete" commands.
The "global" command can broadcast a command to all the sandboxes. You can start, stop, restart all sandboxes at once, or run a query everywhere.

$ dbdeployer global use "select @@server_id, @@port, @@server_uuid"
# Running "use_all" on group_msb_8_0_4
# server: 1
@@server_id @@port  @@server_uuid
100 20405   00020405-1111-1111-1111-111111111111
# server: 2
@@server_id @@port  @@server_uuid
200 20406   00020406-2222-2222-2222-222222222222
# server: 3
@@server_id @@port  @@server_uuid
300 20407   00020407-3333-3333-3333-333333333333

# Running "use_all" on group_sp_msb_8_0_4
# server: 1
@@server_id @@port  @@server_uuid
100 21405   00021405-1111-1111-1111-111111111111
# server: 2
@@server_id @@port  @@server_uuid
200 21406   00021406-2222-2222-2222-222222222222
# server: 3
@@server_id @@port  @@server_uuid
300 21407   00021407-3333-3333-3333-333333333333

# Running "use" on msb_8_0_4
@@server_id @@port  @@server_uuid
1   8004    00008004-0000-0000-0000-000000008004
[...]

You can run the commands manually. dbdeployer usage will show which commands are available for every sandbox.

$ dbdeployer usage single

    USING A SANDBOX

Change directory to the newly created one (default: $SANDBOX_HOME/msb_VERSION
for single sandboxes)
[ $SANDBOX_HOME = $HOME/sandboxes unless modified with flag --sandbox-home ]

The sandbox directory of the instance you just created contains some handy
scripts to manage your server easily and in isolation.

"./start", "./status", "./restart", and "./stop" do what their name suggests.
start and restart accept parameters that are eventually passed to the server.
e.g.:

  ./start --server-id=1001

  ./restart --event-scheduler=disabled

"./use" calls the command line client with the appropriate parameters,
Example:

    ./use -BN -e "select @@server_id"
    ./use -u root

"./clear" stops the server and removes everything from the data directory,
letting you ready to start from scratch. (Warning! It's irreversible!)

When you don't need the sandboxes anymore, you can dismiss them with a single command:

$ dbdeployer delete ALL
Deleting the following sandboxes
$HOME/sandboxes/group_msb_8_0_4
$HOME/sandboxes/group_sp_msb_8_0_4
$HOME/sandboxes/msb_8_0_4
$HOME/sandboxes/multi_msb_8_0_4
$HOME/sandboxes/rsandbox_8_0_4
Do you confirm? y/[N]

There is an option to skip the confirmation, which is useful for scripting unattended tests.

Customization

One of the biggest problems with MySQL-Sandbox was that most of the functioning is hard-coded, and the scripts needed to run the sandboxes are generated in different places, so that extending or modifying features became more and more difficult. When I designed dbdeployer, I gave myself the goal of making the tool easy to change, and the code easy to understand and extend.

For this reason, I organized everything related to code generation (the scripts that initialize and run the sandboxes) in a collection of templates and default variables that are publicly visible and modifiable.

$ dbdeployer templates -h
The commands in this section show the templates used
to create and manipulate sandboxes.

Usage:
  dbdeployer templates [command]

Aliases:
  templates, template, tmpl, templ

Available Commands:
  describe    Describe a given template
  export      Exports all templates to a directory
  import      imports all templates from a directory
  list        list available templates
  reset       Removes all template files
  show        Show a given template

You can list the templates on the screen.

$ dbdeployer templates list single
    [single]      replication_options       : Replication options for my.cnf
    [single]      load_grants_template      : Loads the grants defined for the sandbox
    [single]      grants_template57         : Grants for sandboxes from 5.7+
    [single]      grants_template5x         : Grants for sandboxes up to 5.6
    [single]      my_template               : Prefix script to run every my* command line tool
    [single]      show_binlog_template      : Shows a binlog for a single sandbox
    [single]      use_template              : Invokes the MySQL client with the appropriate options
    [single]      clear_template            : Remove all data from a single sandbox
    [single]      restart_template          : Restarts the database (with optional mysqld arguments)
    [single]      start_template            : starts the database in a single sandbox (with optional mysqld arguments)
    [single]      stop_template             : Stops a database in a single sandbox
    [single]      send_kill_template        : Sends a kill signal to the database
    [single]      show_relaylog_template    : Show the relaylog for a single sandbox
    [single]      Copyright                 : Copyright for every sandbox script
    [single]      expose_dd_tables          : Commands needed to enable data dictionary table usage
    [single]      init_db_template          : Initialization template for the database
    [single]      grants_template8x         : Grants for sandboxes from 8.0+
    [single]      add_option_template       : Adds options to the my.sandbox.cnf file and restarts
    [single]      test_sb_template          : Tests basic sandbox functionality
    [single]      sb_include_template       : TBD
    [single]      gtid_options              : GTID options for my.cnf
    [single]      my_cnf_template           : Default options file for a sandbox
    [single]      status_template           : Shows the status of a single sandbox

Then it's possible to examine template contents:

$ dbdeployer templates describe --with-contents init_db_template
# Collection    : single
# Name          : init_db_template
# Description   : Initialization template for the database
# Notes         : This should normally run only once
# Length        : 656
##START init_db_template
#!/bin/bash
        {{.Copyright}}
        # Generated by dbdeployer {{.AppVersion}} using {{.TemplateName}} on {{.DateTime}}
        BASEDIR={{.Basedir}}
        export LD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$LD_LIBRARY_PATH
        export DYLD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$DYLD_LIBRARY_PATH
        SBDIR={{.SandboxDir}}
        DATADIR=$SBDIR/data
        cd $SBDIR
        if [ -d $DATADIR/mysql ]
        then
            echo "Initialization already done."
            echo "This script should run only once."
            exit 0
        fi

        {{.InitScript}} \
            {{.InitDefaults}} \
            --user={{.OsUser}} \
            --basedir=$BASEDIR \
            --datadir=$DATADIR \
            --tmpdir={{.Tmpdir}} {{.ExtraInitFlags}}

##END init_db_template

The one above is the template that generates the initialization script. In MySQL-Sandbox, this was handled in the code, and it was difficult to figure out what went wrong when the initialization failed. The Go language has an excellent support for code generation using templates, and with just a fraction of its features I implemented a few dozen scripts which I am able to modify with ease. Here's what the deployed script looks like

#!/bin/bash

#    DBDeployer - The MySQL Sandbox
#    Copyright (C) 2006-2018 Giuseppe Maxia
#
#    Licensed under the Apache License, Version 2.0 (the "License");
#    you may not use this file except in compliance with the License.
#    You may obtain a copy of the License at
#
#        http://www.apache.org/licenses/LICENSE-2.0
#
#    Unless required by applicable law or agreed to in writing, software
#    distributed under the License is distributed on an "AS IS" BASIS,
#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#    See the License for the specific language governing permissions and
#    limitations under the License.

# Generated by dbdeployer 0.1.24 using init_db_template on Tue Feb 20 14:45:29 CET 2018
BASEDIR=$HOME/opt/mysql/8.0.4
export LD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$DYLD_LIBRARY_PATH
SBDIR=$HOME/sandboxes/msb_8_0_4
DATADIR=$SBDIR/data
cd $SBDIR
if [ -d $DATADIR/mysql ]
then
 echo "Initialization already done."
 echo "This script should run only once."
 exit 0
fi

$HOME/opt/mysql/8.0.4/bin/mysqld \
    --no-defaults \
    --user=$USER \
    --basedir=$BASEDIR \
    --datadir=$DATADIR \
    --tmpdir=$HOME/sandboxes/msb_8_0_4/tmp \
    --initialize-insecure --default_authentication_plugin=mysql_native_password

Let's see the quick-and-dirty usage. If you want to change a template and use it just once, do the following:

$ dbdeployer templates show init_db_template
Save it to a file init_db.txt and edit it. Be careful, though: removing or altering essential labels may block the sandbox initialization.
Use the template file in the next command:

$ dbdeployer single 8.0.4 --use-template=init_db_template:init_db.txt

For more permanent results, when you'd like to change a template, or several ones, permanently, you can use the export/import commands

List the templates related to replication (dbdeployer templates list replication)
Export the templates to the directory "mydir" $ dbdeployer templates export replication mydir
edit the templates you want to change inside "mydir/replication"
Import the templates dbdeployer templates import replication mydir

The templates will end inside $HOME/.dbdeployer/templates_$DBDEPLOYER_VERSION and dbdeployer will load then instead of using the ones stored internally. The next time that one of those templates will be needed, it will be collected from the file. If you run dbdeployer templates list or describe, the ones saved to file will be marked with {F}.
To go back to the built-in behavior, simply run dbdeployer templates reset

In addition to templates, dbdeployer uses a set of values when creating sandboxes. Like templates, this set is used from internal store, but it can be exported to a configuration file.

$ dbdeployer admin show
# Internal values:
{
  "version": "0.1.24",
  "sandbox-home": "$HOME/sandboxes",
  "sandbox-binary": "$HOME/opt/mysql",
  "master-slave-base-port": 11000,
  "group-replication-base-port": 12000,
  "group-replication-sp-base-port": 13000,
  "multiple-base-port": 16000,
  "group-port-delta": 125,
  "sandbox-prefix": "msb_",
  "master-slave-prefix": "rsandbox_",
  "group-prefix": "group_msb_",
  "group-sp-prefix": "group_sp_msb_",
  "multiple-prefix": "multi_msb_"
}

The values named *-base-port are used to calculate the port for each node in a multiple deployment. The calculation goes:

sandbox_port + base_port + (revision_number * 100)

So, for example, when deploying replication for 5.7.21, the sandbox port would be 5721, and the final base port will be calculated as follows:

5721 + 11000 + 21 * 100 = 18821

This number will be incremented for each node in the cluster, so that the master will get 18822, and the first slave 18823.

Using the commands dbdeployer admin export and import you can customize the default values in a way similar to what we saw for the templates.

Thanks

I'd like to thank:

Ronald Bradford and René Cannaò, for priceless advice on the usability when the tool was in its early stage of development;
Shlomi Noach, for adopting dbdeployer even before it was feature complete;
Frédéric "lefred" Descamps, for allowing me to present dbdeployer at the pre-FOSDEM MySQL event;
The Go community, for the exciting environment offered to newcomers.

A note about unpacking MySQL tarball

When using MySQL tarballs, we may have some problems due to the enormous size that the tarballs have reached. Look at this:

690M    5.5.52
1.2G    5.6.39
2.5G    5.7.21
3.6G    8.0.0
1.3G    8.0.1
1.5G    8.0.2
1.9G    8.0.3
1.9G    8.0.4

This becomes a serious problem when you want to unpack the tarball inside a low-resource virtual machine or a Docker container. I have asked the MySQL team to provide reduced tarballs, possibly in a fixed location, so that sandboxes creation could be fully automated. I was told that something will be done soon. In the meantime, I provide such reduced tarballs, which have a more reasonable size:

 49M    5.5.52
 61M    5.6.39
346M    5.7.21
447M    8.0.0
462M    8.0.1
254M    8.0.2
270M    8.0.3
244M    8.0.4

Using these reduced tarballs, which are conveniently packed in a docker container (datacharmer/mysql-sb-full contains all major MySQL versions), I have automated dbdeployer tests with minimal storage involvement, and that improves the test speed as well.

Detailed list of features

Feature	MySQL-Sandbox	dbdeployer	dbdeployer planned
Single sandbox deployment	yes	yes
unpack command	sort of ¹	yes
multiple sandboxes	yes	yes
master-slave replication	yes	yes
"force" flag	yes	yes
pre-post grants SQL action	yes	yes
initialization options	yes	yes
my.cnf options	yes	yes
custom my.cnf	yes	yes
friendly UUID generation	yes	yes
global commands	yes	yes
test replication flow	yes	yes
delete command	yes ²	yes
group replication SP	no	yes
group replication MP	no	yes
prevent port collision	no	yes ³
visible initialization	no	yes ⁴
visible script templates	no	yes ⁵
replaceable templates	no	yes ⁶
configurable defaults	no	yes ⁷
list of source binaries	no	yes ⁸
list of installed sandboxes	no	yes ⁹
test script per sandbox	no	yes ¹⁰
integrated usage help	no	yes ¹¹
custom abbreviations	no	yes ¹²
version flag	no	yes ¹³
fan-in	no	no	yes ¹⁴
all-masters	no	no	yes ¹⁵
Galera/PXC/NDB	no	no	yes ¹⁸
finding free ports	yes	no	yes
pre-post grants shell action	yes	no	maybe
getting remote tarballs	yes	no	yes
circular replication	yes	no	no ¹⁶
master-master (circular)	yes	no	no
Windows support	no	no	no ¹⁷

It's achieved using --export_binaries and then abandoning the operation. ↩
Uses the sbtool command ↩
dbdeployer sandboxes store their ports in a description JSON file, which allows the tool to get a list of used ports and act before a conflict happens. ↩
The initialization happens with a script that is generated and stored in the sandbox itself. Users can inspect the init_db script and see what was executed. ↩
All sandbox scripts are generated using templates, which can be examined and eventually changed and re-imported. ↩
See also note 5. Using the flag --use-template you can replace an existing template on-the-fly. Group of templates can be exported and imported after editing. ↩
Defaults can be exported to file, and eventually re-imported after editing. ↩
This is little more than using an O.S. file listing, with the added awareness of the source directory. ↩
Using the description files, this command lists the sandboxes with their topology and used ports. ↩
It's a basic test that checks whether the sandbox is running and is using the expected port. ↩
The "usage" command will show basic commands for single and multiple sandboxes. ↩
The abbreviations file allows user to define custom shortcuts for frequently used commands. ↩
Strangely enough, this simple feature was never implemented for MySQL-Sandbox, while it was one of the first additions to dbdeployer. ↩
Will use the multi source technology introduced in MySQL 5.7. ↩
Same as n. 13. ↩
Circular replication should not be used anymore. There are enough good alternatives (multi-source, group replication) to avoid this old technology. ↩
I don't do Windows, but you can fork the project if you do. ↩
For Galera/PXC and MySQL Cluster I have ideas, but I may need help to implement. ↩