The Data Charmer: community

Showing posts with label community. Show all posts

Sunday, August 25, 2019

One upgrade to rule them all

Up to now, the way of updating dbdeployer was the same as installing it for the first time, i.e. looking at the releases page, downloading the binaries for your operating system, unpacking it and finally replacing the existing binaries.

This is not the procedure I follow, however, as for me updating means just compile the latest version I have just finished coding. For this reason, when Simon Mudd mentioned to me that dbdeployer should update itself over the Internet, I didn’t immediately grasp the concept. But then he talked to me again, and he even coded a sample script that does what he suggests: look for the latest release, download it, and replace the current executable with the latest one.

The only problem with that approach is that dbdeployer should then be distributed with one binary and one update script (and we should make sure that more tools such as curl and gunzip are available in the host computer), while I have pushed the concept that the binary executable should be the only thing needed. Thus I spent a few hours converting the script ideas into dbdeployer internal workings, and given that it had already all the components in place (such as downloading files from the internet and unpacking them) it was done quickly. As a result, dbdeployer can now look online for newer versions of itself, and get its own replacement painlessly.

The new functionality is dbdeployer update, which works almost silently to do what you expect, if the version online is higher than the local one, and protects you against accidentally fetching an outdated version or binaries from the wrong operating system.

This is the last update that you need to do manually. After that, the next ones can be fetched directly with dbdeployer itself.

The new command has several options, mostly added for the advanced user:

$ dbdeployer update -h
Updates dbdeployer in place using the latest version (or one of your choice)

Usage:
  dbdeployer update [version] [flags]

Examples:

$ dbdeployer update
# gets the latest release, overwrites current dbdeployer binaries

$ dbdeployer update --dry-run
# shows what it will do, but does not do it

$ dbdeployer update --new-path=$PWD
# downloads the latest executable into the current directory

$ dbdeployer update v1.34.0 --force-old-version
# downloads dbdeployer 1.34.0 and replace the current one
# (WARNING: a version older than 1.36.0 won't support updating)


Flags:
      --OS string           Gets the executable for this Operating system
      --docs                Gets the docs version of the executable
      --dry-run             Show what would happen, but don't execute it
      --force-old-version   Force download of older version
  -h, --help                help for update
      --new-path string     Download updated dbdeployer into a different path
      --verbose             Gives more info

The option that would also benefit casual users is --dry-run, as it shows which release it would get the executable from, without actually changing anything.

You may ask: why would I want to update using an older version? Although at first sight is not a sound idea, there are at least two reasons for that:

there was a bug in the latest release. It isn’t an uncommon occurrence, and much as I would like to claim my code is bug free, I know that’s a pipe dream. So, if a bug is found that affects your workflow, a safe course of action is getting the previous release.
From time to time, I publish experimental releases out of a branch. You may want to try them out, and then go back to the comfort of the latest non-experimental one.

If you don’t know what the releases contain, you can look them up without using a browser, with the command dbdeployer info releases. For example:

$ dbdeployer info releases latest
--------------------------------------------------------------------------------
Remote version: v1.36.1
Release:        dbdeployer 1.36.1
Date:           2019-08-18T13:23:53Z
## BUG FIXES

- Fix a bug in 'dbdeployer update': it only worked with --verbose enabled
- Fix output of 'dbdeployer update': it now confirms that the update has
   happened

--------------------------------------------------------------------------------
 dbdeployer-1.36.1-docs.linux.tar.gz (5.9 MB)
 dbdeployer-1.36.1-docs.osx.tar.gz (6.1 MB)
 dbdeployer-1.36.1.linux.tar.gz (5.8 MB)
 dbdeployer-1.36.1.osx.tar.gz (5.9 MB)

There is another feature that complements dbdeployer update: it’s the ability of setting up command line completion for dbdeployer.

$ dbdeployer defaults enable-bash-completion -h
Enables bash completion using either a local copy of dbdeployer_completion.sh or a remote one

Usage:
  dbdeployer defaults enable-bash-completion [flags]

Flags:
      --completion-file string   Use this file as completion
  -h, --help                     help for enable-bash-completion
      --remote                   Download dbdeployer_completion.sh from GitHub
      --remote-url string        Where to downloads dbdeployer_completion.sh from (default "https://raw.githubusercontent.com/datacharmer/dbdeployer/master/docs/dbdeployer_completion.sh")
      --run-it                   Run the command instead of just showing it

This command retrieves the latest completion file, and helps you to install it in the right position. By default, it gets the file and doesn’t change anything in your bash_completion directory: it tells you what commands to run. You can let it do the copying (which may require sudo access) using --run-it.

Tuesday, March 19, 2019

dbdeployer community: Part 3 - MySQL Cluster (NDB)

I remember wanting to create MySQL Cluster sandboxes several years ago. By then, however, MySQL-Sandbox technology was not flexible enough to allow an easy inclusion, and the cluster software itself was not as easy to install as it is today. Thus, I kept postponing the implementation, until I started working with dbdeployer.

I included the skeleton of support for MySQL Cluster since the beginning (by keeping a range of ports dedicated for this technology, but I didn’t do anything until June 2018, when I made public my intentions to add support for NDB in dbdeployer with issue #20 (Add support for MySQL Cluster)). The issue had just a bare idea, but I needed help from someone, as my expertise with NDB was limited, and outdated.

Help came in November, when Daniël van Eeden started giving me bits of information on how to put together a cluster sandbox. I still resisted forcing my hand at the implementation, because by then I had realised that my method of checking the database server version to know whether it supported a given feature was inadequate to support anything other than vanilla MySQL or fully complaint forks.

The game changer was the cooperation with TiDB that opened the way for supporting Percona XtraDB Cluster. Even though these technologies are way different from MySQL Cluster, they forced me to improve dbdeployer’s code, making it more flexible, easier to enhance.

When I finally decided to start working on NDB, it took me only a few days to implement it, because I had all the pieces ready for this technology to become part of dbdeployer.

Following Däniel’s instructions, I had a prototype working, which I submitted to #dbdeployer channel on MySQL community slack. In that channel, I got help again from Däniel van Eeden, and then Frédéric Descamps summoned two more experts (Ted Wennmark and Bernd Ocklin), who gave me feedback, advice, and some quick lessons on how the cluster should work, which allowed me to publish a release (dbdeployer 1.23.0) this past week-end.

The implementation has some peculiarities for both users of dbdeployer and MySQL Cluster. For the ones used to dbdeployer, the biggest change is that we are deploying two entities, of which the main one is an NDB cluster, with its own directories and processes, while the MySQL servers are just the visible part of the cluster, but are, in fact, only cluster clients. Still, the cluster works smoothly in dbdeployer paradigm: the cluster is deployed (like Group replication or PXC) as a replication topology, and as such we can run the standard replication test and expect to get the same result that we would see when checking another multi-source deployment.

For people used to NDB, though, seeing NDB as “replication” feels odd, because the cluster is seeing as a distinct entity, and replication is when we transfer data between two clusters. If we were developing a dedicated tool for NDB clusters, this is probably what we would have done, but since we want dbdeployer integration, we must play by the general rules of the tool, where “single” is a stand-alone server instance, and we can’t have dbdeployer deploy single --topology=ndb, because single instance don’t have a topology, which is a property of a group of entities. Therefore, the price to pay for dbdeployer support accepting to see a MySQL cluster deployment as replication.

Now that we have covered all the philosophical angle, it’s time to show an example. Unlike PXC, which is requires Linux, MySQL Cluster can also run on MacOS, which makes my testing much easier.

The first step to run a cluster in dbdeployer is to download a tarball from dev.mysql.com/downloads/cluster, and then expand it in our usual directory ($HOME/opt/mysql):

$ dbdeployer unpack --prefix=ndb --flavor=ndb \  
    ~/Downloads/mysql-cluster-gpl-7.6.9-macos10.14-x86_64.tar.gz  
Unpacking tarball $HOME/Downloads/mysql-cluster-gpl-7.6.9-macos10.14-x86_64.tar.gz to $HOME/opt/mysql/ndb7.6.9  
[...]  
Renaming directory $HOME/opt/mysql/mysql-cluster-gpl-7.6.9-macos10.14-x86_64 to $HOME/opt/mysql/ndb7.6.9

We can repeat the same operation for MySQL Cluster 8.0.14, and in the end we will two expanded tarballs named ndb7.6.9 and ndb8.0.14. With this we can install a few clusters in the same host:

$ dbdeployer deploy replication ndb7.6 --topology=ndb --concurrent  
# ndb7.6 => ndb7.6.9  
$HOME/sandboxes/ndb_msb_ndb7_6_9/initialize_nodes  
MySQL Cluster Management Server mysql-5.7.25 ndb-7.6.9  
2019-03-18 23:47:15 [ndbd] INFO     -- Angel connected to 'localhost:20900'  
2019-03-18 23:47:16 [ndbd] INFO     -- Angel allocated nodeid: 2  
2019-03-18 23:47:16 [ndbd] INFO     -- Angel connected to 'localhost:20900'  
2019-03-18 23:47:16 [ndbd] INFO     -- Angel allocated nodeid: 3  
executing 'start' on node 1  
................ sandbox server started  
executing 'start' on node 2  
.. sandbox server started  
executing 'start' on node 3  
.. sandbox server started  
NDB cluster directory installed in $HOME/sandboxes/ndb_msb_ndb7_6_9  
run 'dbdeployer usage multiple' for basic instructions'


$ dbdeployer deploy replication ndb8.0 --topology=ndb --concurrent  
# ndb8.0 => ndb8.0.14  
$HOME/sandboxes/ndb_msb_ndb8_0_14/initialize_nodes  
MySQL Cluster Management Server mysql-8.0.14 ndb-8.0.14-dmr  
2019-03-18 23:45:53 [ndbd] INFO     -- Angel connected to 'localhost:21400'  
2019-03-18 23:45:53 [ndbd] INFO     -- Angel allocated nodeid: 2  
2019-03-18 23:45:53 [ndbd] INFO     -- Angel connected to 'localhost:21400'  
2019-03-18 23:45:53 [ndbd] INFO     -- Angel allocated nodeid: 3  
executing 'start' on node 1  
........ sandbox server started  
executing 'start' on node 2  
... sandbox server started  
executing 'start' on node 3  
.. sandbox server started  
NDB cluster directory installed in $HOME/sandboxes/ndb_msb_ndb8_0_14  
run 'dbdeployer usage multiple' for basic instructions'

If we look at the sandbox directories, we will see a few more subdirectories than we usually have with other topologies. For example:

 ndb_conf   # cluster configuration  
 ndbnode1   # management node (1)  
 ndbnode2   # data node (2)  
 ndbnode3   # data node (3)  
 node1      # MySQL node 1  
 node2      # MySQL node 2  
 node3      # MySQL node 3

The clusters are well framed into dbdeployer’s architecture, and they respond to standard commands like any other sandbox:

$ dbdeployer sandboxes --full-info  
.-------------------.------.-----------.----------------------------------------------.--------.-------.--------.  
|       name        | type |  version  |                    ports                     | flavor | nodes | locked |  
+-------------------+------+-----------+----------------------------------------------+--------+-------+--------+  
| ndb_msb_ndb7_6_9  | ndb  | ndb7.6.9  | [20900 27510 27511 27512 ]                   | ndb    |     3 |        |  
| ndb_msb_ndb8_0_14 | ndb  | ndb8.0.14 | [21400 28415 38415 28416 38416 28417 38417 ] | ndb    |     3 |        |  
'-------------------'------'-----------'----------------------------------------------'--------'-------'--------'

$ dbdeployer global status  
# Running "status_all" on ndb_msb_ndb7_6_9  
MULTIPLE  /Users/gmax/sandboxes/ndb_msb_ndb7_6_9  
node1 : node1 on  -  port   27510 (27510)  
node2 : node2 on  -  port   27511 (27511)  
node3 : node3 on  -  port   27512 (27512)

# Running "status_all" on ndb_msb_ndb8_0_14  
MULTIPLE  /Users/gmax/sandboxes/ndb_msb_ndb8_0_14  
node1 : node1 on  -  port   28415 (28415)  
node2 : node2 on  -  port   28416 (28416)  
node3 : node3 on  -  port   28417 (28417)

$ dbdeployer global test-replication  
# Running "test_replication" on ndb_msb_ndb7_6_9  
# master 1  
# master 2  
# master 3  
# slave 1  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 2  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 3  
ok - '3' == '3' - Slaves received tables from all masters  
# pass: 3  
# fail: 0

# Running "test_replication" on ndb_msb_ndb8_0_14  
# master 1  
# master 2  
# master 3  
# slave 1  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 2  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 3  
ok - '3' == '3' - Slaves received tables from all masters  
# pass: 3  
# fail: 0

Like other topologies, also the NDB cluster has a script that shows the status of the nodes:

$ ~/sandboxes/ndb_msb_ndb7_6_9/check_nodes  
+---------+-----------+---------------+--------+---------+-------------+-------------------+  
| node_id | node_type | node_hostname | uptime | status  | start_phase | config_generation |  
+---------+-----------+---------------+--------+---------+-------------+-------------------+  
|       2 | NDB       | localhost     |     58 | STARTED |           0 |                 1 |  
|       3 | NDB       | localhost     |     58 | STARTED |           0 |                 1 |  
|       1 | MGM       | localhost     |   NULL | NULL    |        NULL |              NULL |  
|       4 | API       | localhost     |   NULL | NULL    |        NULL |              NULL |  
|       5 | API       | localhost     |   NULL | NULL    |        NULL |              NULL |  
|       6 | API       | localhost     |   NULL | NULL    |        NULL |              NULL |  
|       7 | API       | localhost     |   NULL | NULL    |        NULL |              NULL |  
+---------+-----------+---------------+--------+---------+-------------+-------------------+  
Connected to Management Server at: localhost:20900  
Cluster Configuration  
---------------------  
[ndbd(NDB)] 2 node(s)  
id=2    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0, *)  
id=3    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)  
id=1    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)

[mysqld(API)]   4 node(s)  
id=4    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)  
id=5    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)  
id=6    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)  
id=7 (not connected, accepting connect from localhost)

It is possible that we will need more iterations to make the deployment more robust. When testing it, keep in mind that this deployment is only for testing, and it won’t probably have all the performance that you may find in a well deployed production cluster. Still, compared to other topologies, the replication tests performed faster than I expected.

Friday, March 08, 2019

dbdeployer community - Part 2: Percona XtraDB Cluster

This was not on the radar. I have never been proficient in Galera clusters and related technologies, and thus I hadn’t given much thought to Percona Xtradb Cluster (PXC), until Alkin approached me at FOSDEM, and proposed to extend dbdeployer features to support PXC. He mentioned that many support engineers at Percona use dbdeployer) on a daily basis and that the addition of PXC would be welcome.

I could not follow up much during the conference, but we agreed on making a proof-of-concept in an indirect way: if several nodes of PXC can run in the same host using shell scripts, dbdeployer could reproduce that behavior.

A few weeks later, when dbdeployer had already been enhanced with flavors and capabilities, I got the script that can deploy several nodes in the same host. It’s a simplification of the ones used in Percona PXC tests, which got me started.

I followed a method similar to the one I used for MySQL Group replication. The technology is similar, although the MySQL Team used a different approach for the installation. The basic principle is that the cluster needs two ports per node: in addition to the regular MySQL port, there is a communication port (SST or Snapshot State Transfer port) that is needed to exchange cluster data. Using this information, and following the sample in the script, I could produce a prototype that surprisingly worked at the first try!

The cluster did deploy, and the replication test, which comes free of charge when you implement a replication-type sandbox using standard templates, worked flawlessly.

Then I hooked the deployment method into dbdeployer concurrency engine, which is able to deploy several nodes at once. Here I hit the first problem. In PXC, the nodes are not equal at startup. The first node needs to be initialised without other nodes addresses, and it becomes the reference for other nodes to join the cluster. If I provided complete references for all nodes (as I do for MySQL Group Replication,) it didn’t work.

After some talk with Percona engineers on Slack, I figured out that the nodes can be deployed together, and the second and third node will just wait for the first one to come online and then join. That worked in principle, or when I deployed sequentially, but not when they are deployed all at once. Fortunately, dbdeployer has several ways of enabling debugging output, and after a few unsuccessful attempts I got the reason: PXC initialisation happens using rsync on port 4444. When the nodes are started sequentially, the receiving node takes control of port 4444 without conflicts, gets the job done and releases the port. When we deploy all nodes at once, there is a race for the possession of the synchronisation port, and a random node will win it, leaving the others waiting forever.

Thus, I modified the installation to allocate a different rsync port for each node, and after that the concurrent installation worked as well.

The last obstacle was the discovery that there is yet another port (IST, or Incremental State Transfer port), which is always one number bigger than the SST port. Thus, if the SST port is, say, 5555, the IST port is set to 5556. This means that, unlike other dbdeployer clusters, I can’t set port numbers incrementally, but I need to set them with an interval. I did that, and the cluster came with a default allocation of four ports per node (MySQL, rsync, SST, IST). If we also enable MySQLX, which comes includes as PXC binaries are based on MySQL 5.7, we would set 5 ports per node, and a majestic 15 ports for a three-node cluster.

Anyway, the support for Percona XtraDB Cluster is available in dbdeployer 1.21.0. Let’s see a sample session to use the new functionality.

$ $ dbdeployer --version
dbdeployer version 1.21.0

$ dbdeployer unpack --prefix=pxc ~/downloads/Percona-XtraDB-Cluster-5.7.25-rel28-31.35.1.Linux.x86_64.ssl100.tar.gz  
[...]  
Renaming directory $HOME/opt/mysql/Percona-XtraDB-Cluster-5.7.25-rel28-31.35.1.Linux.x86_64.ssl100 to $HOME/opt/mysql/pxc5.7.25

Before trying the cluster, it would be a good practice to make sure that your system can deploy a single node.

$ dbdeployer deploy single pxc5.7.25  
Database installed in $HOME/sandboxes/msb_pxc5_7_25  
run 'dbdeployer usage single' for basic instructions'  
. sandbox server started

$ $HOME/sandboxes/msb_pxc5_7_25/test_sb  
[...]  
# Tests :    11  
# PASS  :    11  
# fail  :     0

$ dbdeployer delete msb_pxc5_7_25  
[...]

And now for the real test:

$ dbdeployer deploy replication --topology=pxc pxc5.7.25  
Installing and starting node 1  
. sandbox server started  
Installing and starting node 2  
...... sandbox server started  
Installing and starting node 3  
..... sandbox server started  
Replication directory installed in $HOME/sandboxes/pxc_msb_pxc5_7_25  
run 'dbdeployer usage multiple' for basic instructions'

We should now see all the allocated ports.

$ dbdeployer sandboxes --header  
            name                       type             version                                      ports  
---------------------------- ------------------------ ----------- ----------------------------------------------------------------------------  
 pxc_msb_pxc5_7_25        :   Percona-Xtradb-Cluster   pxc5.7.25   [26226 26352 26353 26364 26227 26354 26355 26365 26228 26356 26357 26366 ]

If we want more detail, we can look at the sandbox description file:

$ cat $HOME/sandboxes/pxc_msb_pxc5_7_25/sbdescription.json  
{
    "basedir": "$HOME/opt/mysql/pxc5.7.25",  
    "type": "Percona-Xtradb-Cluster",  
    "version": "pxc5.7.25",  
    "flavor": "pxc",  
    "port": [  
        26226,  
        26352,  
        26353,  
        26364,  
        26227,  
        26354,  
        26355,  
        26365,  
        26228,  
        26356,  
        26357,  
        26366  
    ],  
    "nodes": 3,  
    "node_num": 0,  
    "dbdeployer-version": "1.21.0",  
    "timestamp": "Thu Mar  7 17:20:03 CET 2019",  
    "command-line": "dbdeployer deploy replication --topology=pxc pxc5.7.25"  
 }

Now we can run the replication test. Given that we have a cluster where all nodes are masters, the test will create a table in each node, and read the result in each slave (again, each node):

$ $HOME/sandboxes/pxc_msb_pxc5_7_25/test_replication  
# master 1  
# master 2  
# master 3  
# slave 1  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 2  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 3  
ok - '3' == '3' - Slaves received tables from all masters  
# pass: 3  
# fail: 0

It’s a simple test, but it tells us that the cluster is fully functional.

Thanks to Alkin and other Percona engineers who have tested the prototype in real time.

Wednesday, March 06, 2019

dbdeployer community - Part 1: TiDB

After a conference, when I take stock of what I have learned, I usually realise that the best achievements are the result of interacting with other attendees during the breaks, rather than simply listening to the lectures. It might be because I follow closely the blogosphere and thus the lectures have few surprises in store for me, or perhaps because many geeks take the conference as an excuse to refresh dormant friendships, catch up with technical gossip, and ask their friends some questions that were too sensitive to be discussed over Twitter and have been waiting for a chance of an in-person meeting to see the light of the day.

I surely had some of such questions, and I took advantage of the conference to ask them. As it often happens, I got satisfactory responses, but the latest FOSDEM conference was different than usual, because I got the best experience from the questions that others did ask me.

As it turned out, others were waiting for a chance to discuss things over coffee or food, and I saw that my pet project (dbdeployer) is a lot more popular than I thought, and it is being used silently in several environments. It should not be surprising if you read several MySQL reports on bugs at bugs.mysql.com where it is common the usage of sandboxes to reproduce user issues. Anyway, I got some praise, some requests, a few ideas for improvements, advance notice of an incoming graphical interface, and a few concrete collaboration proposals.

One of such proposals came from Morgan Tocker, who suggested enhancing dbdeployer to support TiDB. At first, it seemed uninteresting, as TiDB is designed to be distributed, and installing just a component didn’t immediately look useful. However, Morgan pointed out that it could be used as a tool to test compatibility with existing applications, and as such it could gain much more value than I initially thought. We decided to try a quick hackathon to make a proof of concept.

It was a great pleasure to figure out, in just over one hour of close interaction, that dbdeployer design for flexibility was up to the task. We managed to make TiDB work with dbdeployer simply by exporting, editing, and re-loading a few templates.

The exercise showed strengths and limitations in both projects. We agreed that dbdeployer had to lose some assumptions (such as “I am working with a MySQL server”) and become able to recognise which flavor of MySQL-lookalike we are dealing with. At the same time, we noted that TiDB is not compatible when it comes to deployment and bootstrap: it is so simple and straightforward that its initialisation doesn’t fit in the complex operation that is a MySQL server warm-up.

Pleased with the initial success, we kept in touch and, after dbdeployer acquired the ability of telling one flavor from another, we put together the various pieces to make dbdeployer recognise and install TiDB. We found and fixed several bugs in both project, and finally released dbdeployer 1.19.0, which can use a TiDB server transparently.

What does transparently mean? It means that tests for TiDB deployment can run alongside tests for other MySQL servers, and the sandbox scripts (such as start, stop, use, status, and test_sb) work as expected and produce a compatible output. Thus, there is a TiDB test running together with another dozen MySQL versions.

Now, if you want, you can evaluate TiDB in your computer without installing the full stack. It won’t be as fast as the real thing: what is installed as a single node is a slower emulation of the real database, but it is enough to give you an idea of what queries you can and cannot run in TiDB, and perhaps try to see if your application could run on TiDB at all.

The collaboration with TiDB was especially useful because the changes needed to smooth the TiDB integration have made made dbdeployer better suited to add support for more not-quite-mysql servers, such as the one that we’ll see in the next post.

But before reaching that point, here’s an example of TiDB deployment on Linux:

$ wget https://download.pingcap.org/tidb-master-linux-amd64.tar.gz  
[...]  
2019-02-24 04:46:26 (2.26 MB/s) - 'tidb-master-linux-amd64.tar.gz' saved [16304317/16304317]

$ dbdeployer unpack tidb-master-linux-amd64.tar.gz --unpack-version=3.0.0 --prefix=tidb  
Unpacking tarball tidb-master-linux-amd64.tar.gz to $HOME/opt/mysql/tidb3.0.0  
1
Renaming directory /home/msandbox/opt/mysql/tidb-master-linux-amd64 to /home/msandbox/opt/mysql/tidb3.0.0

TiDB tarballs doesn't come with a client. We need to use one from MYSQL 5.7. Rather than downloading the huge tarball from MySQL site, we can get a smaller one from a GitHub repository, using dbdeployer itself (NB: this reduced tarball is only for Linux)


$ dbdeployer remote list  
Files available in https://raw.githubusercontent.com/datacharmer/mysql-docker-minimal/master/dbdata/available.json  
5.7 -> [mysql-5.7.24 mysql-5.7.25]  
8.0 -> [mysql-8.0.13 mysql-8.0.15]  
4.1 -> [mysql-4.1.22]  
5.0 -> [mysql-5.0.15 mysql-5.0.96]  
5.1 -> [mysql-5.1.72]  
5.5 -> [mysql-5.5.61 mysql-5.5.62]  
5.6 -> [mysql-5.6.41 mysql-5.6.43]

$ dbdeployer remote get mysql-5.7.25  
File /home/msandbox/mysql-5.7.25.tar.xz downloaded

$ dbdeployer unpack mysql-5.7.25.tar.xz  
[...]  
Renaming directory /home/msandbox/opt/mysql/mysql-5.7.25 to /home/msandbox/opt/mysql/5.7.25

Now we are ready to install TiDB:

$ dbdeployer deploy single tidb3.0.0 --client-from=5.7.25  
Creating directory /home/msandbox/sandboxes  
Database installed in $HOME/sandboxes/msb_tidb3_0_0  
run 'dbdeployer usage single' for basic instructions'  
. sandbox server started

Once installed, a TiDB sandbox behaves like a MySQL sandbox.

$ $HOME/sandboxes/msb_tidb3_0_0/use  
Welcome to the MySQL monitor.  Commands end with ; or \g.  
Your MySQL connection id is 2  
Server version: 5.7.10-TiDB-v3.0.0-beta-111-g266ff4b6f MySQL Community Server (Apache License 2.0)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its  
affiliates. Other names may be trademarks of their respective  
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost:3000] {msandbox} ((none)) >

Monday, April 02, 2018

dbdeployer GA and semantic versioning

dbdeployer went into release candidate status a few weeks ago. Since then, I added no new features, but a lot of tests. The test suite now runs 3,000+ tests on MacOS and a bit more on Linux, for a grand total of 6,000+ tests that need to run at least twice: once with concurrency enabled and once without. I know that testing can't prove the absence of bugs, but I am satisfied with the results, since all this grinding has allowed me to find several bugs and fix them.

In this framework, I felt that dbdeployer could exit candidate status and get to version 1.0. This happened on March 26th. An immediate side effect of this change is that from this point on, dbdeployer must adhere to the semantic versioning principles:

A version number is made of Major, Minor, and Revision. When changes are applied, the following happens:

Backward-compatible bug fixes increment the Revision number (e.g. 1.0.0 to 1.0.1)
Backward-compatible new features increment the Minor number (1.0.1 to 1.1.0)
Backward incompatible changes (either features or bug fixes that break compatibility with the API) increment the Major number (1.15.9 to 2.0.0)

The starting API is defined in API-1.0.md, which was generated manually.
The file API-1.1.md contains the same API definition, but was generated automatically and can be used to better compare the initial API with further version.

So the app went from 1.0 to 1.1 in less than one week. In obedience to semantic versioning principles, if a new backward-compatible feature is added, the minor number of the version increases. What does backward-compatible mean? It means that commands, procedures, and workflows that were working with the previous version will also work with the current one. It's just that the new release will have more capabilities. In this case, the added feature is the ability of having environment variables HOME and PWD recognized and properly expanded in the configuration file. It's nothing very exciting, but changing the minor number gives the user a hint of what to expect from the new release.

Let's give a few examples:

Version goes from 1.0.0 to 1.0.1: It means that there are only bug fixes, and you should expect to use it without modifications.
Version goes from 1.0.1 to 1.1.0: You should be able to use dbdeployer just as before, but you should check the release notes to see what's new, because there are new functionalities that might be useful to you.
Version goes from 1.3.15 to 2.0.0: Danger! A major number bumped up means that something has changed in the API, which is now partially or totally incompatible with the previous release. Your workflow may break, and you must check the release notes and the documentation to learn how to use the new version.

This is different from other applications. For example, the MySQL server uses version numbers with hard to predict meaning:

MySQL 5.1, 5.5, 5.6, and 5.7 should be, in fact, major version number changes, not minor ones. Each one of them introduces incompatible changes that require careful review of the novelties.
Within the same version (such as MySQL 5.7) there are a lot of compatible and incompatible changes, although the minor number stays the same.

The plan with dbdeployer is to use the version number as a manifest, to give users an immediate feeling of what to expect. Rather than changing minor or major number only when the developers think there is some juicy new thing of which they can be proud, the version number will tell whether users should worry about compatibility or not.

In my general development plan, you are more likely to see versions like "1.25.16" than version "2.0," meaning that I will try to keep the current API valid as much as possible. A major version change will signify that a new feature could not fit in the current infrastructure and a new one would be needed.

You can draw your own conclusions here. A semantic versioning paradigm is unlikely to be adopted by most software vendors, because version numbers are often marketing gimmicks, and they can charge you more convincingly for a version 6.0 than for version 1.34.
Free software, OTOH, can do this. My goal with dbdeployer is to help the MySQL community, and I will achieve that goal more easily if my releases can be adopted without fear of incompatibility.

Monday, March 05, 2018

MySQL security for real users

Security features overview

One of Oracle's tenets is the focus on security. For this reason, when it took over the stewardship of MySQL, it started addressing the most common issues. It was not quick acting, but we have seen real progress:

MySQL 5.7 has removed the anonymous accounts, which was the greatest threat to security. Because of those accounts, and the default privileges granted to them, users without any privileges could access the "test" database and do serious damage. Additionally, because of the way the privilege engine evaluates accounts, anonymous users could hijack legitimate users, by preventing them to work properly.
The "root" account now comes with a password defined during initialization. This is good news for security, but bad news for how the change was implemented.
There is a new way of setting an options file for connection credentials: the mysql_config_editor paired with option --login-path allows users to store encrypted credentials for secure use. Also here, while we should rejoice for the added security, we can't help feeling that the implementation is yet again far from meeting users needs.
There is an useful warning (introduced in MySQL 5.6) when using a password on the command line, telling users that it is a risk. Also in this case, we have a usability issue: while users care about their production deployments and use option files to avoid using passwords on the command line, there are, nonetheless, a lot of testing scripts, used in safe environment or with non-valuable data, where a password in the command line was not an issue, and the new warning simply screws up the result of those carefully crafted tests. This change, which can't be overcome without modifying the MySQL clients code, needs users to change their existing tests to adapt to the new behavior.
MySQL 8 introduces roles, which simplify the accounts management. There are some minor usability issues, although in general the feature meets expectations.

This is the scenario of the main enhancements in MySQL since 5.6. Each one of them has some usability problems, some minor, some really bad.
We will first have a look at the problems mentioned above, and then examine the root cause for why they have arisen.

Usability issues

I start by noticing that some developers in the MySQL team have been working there for many years, starting with the time when MySQL was a different database and was used really differently.
In those times, managing the database meant that a human (the DBA) would run operations manually, take a look at the result, and adjust when needed. And then, when things went wrong, the same human explored the database system to find out what happened, took action, and went back to sleep.

Human-centered management leads to human problems: lazy DBA left their databases without password, using the root account, and exposing the server to uninspired attacks; they used passwords on the command line, without caring for options files (or without knowing about them.) Careless DBAs did not deal with anonymous users, leaving a dangerous backdoor in their server.

Some of the new functionalities introduced in the latest MySQL versions are aimed at this type of users: when you install MySQL, you get a message saying: your root password is ************, and the lazy DBAs have no option but to take note and use it. When they use the password on the command line, the annoying warning forces them to start using an options file or the mysql_config_editor.

This is all good, but the main problem here is that the DBAs of 10 years ago are on the verge of extinction. They are replaced by a new breed of DBAs who are not lazy, because they can't afford to be, and need to use dozens, hundreds, thousands of databases at once, using configuration management tools that don't require manual intervention, and actually abhor it. In the land of automation, some of the MySQL security enhancements are not seen as a solution, but as new problems.

Let's see an interesting example: docker containers.

Using Docker, MySQL images are deployed using a password on the command line. This is done for compatibility with the first implementation of the image maintained by the Docker team, where you deploy with this syntax:

docker run -e MYSQL_ROOT_PASSWORD=secret -d mysql

The MYSQL_ROOT_PASSWORD is a directive that becomes an environment variable inside the container, and the server uses it during initialization. As you can imagine, this is not recommended for a secure environment. Then, what's the MySQL team recommendation? They suggest the same strategy used for manual installation: set a directive MYSQL_RANDOM_ROOT_PASSWORD that results in a random password being generated, then collected by the DBA and used. Alternatively, the directive MYSQL_ONETIME_PASSWORD will force the root user to change the password on first connection.

The above suggestions were designed with the ancient DBA still in mind, while container deployment is even more automated than VMs, and it is based on the principle of immutable objects, i.e. containers that spring up from the cloud ready to run, with no configuration needed, and especially no configuration that requires someone (or some tool) to extract a new password from a log. I proposed a different solution, that would never show passwords on the command line and while it was implemented, but it still feels like a hack to circumvent an inadequate design.

As a result, the implementation inside the MySQL recommended Docker image uses "--initialize-insecure" to start the server. This is an implicit recognition of the bad design of the initialization feature. What was designed to overcome DBA's laziness becomes an obstacle towards automation.

We have a similar problem with mysql_config_editor: the tool will create a safe configuration file with credentials for multiple instances, but the password must be inserted manually. Consequently, this potentially useful feature doesn't get adopted, because it would be too difficult or impossible to automate properly.

We have seen that, of the security features that were introduced lately, only a few can be used safely in an automated environment, and all of them have at least one small usability quirk. I have talked about a confusing issue related to the removal of anonymous users where in their eagerness of removing the vulnerability the MySQL team removed also the "test" database, which was a consequence, not the cause of the problem. And I have recently talked about roles usability where there are still open problems, like the ability of telling roles from users which are apparently not considered a bug by the MySQL team.

All the above considerations led me to ask: how did we get to this point? There is an active community, and feedback is offered often with plenty of detail. How come we have such an abundance of usability issues? Don't the developers spend time with users at conferences to learn what they expect? Don't they read articles and blog posts about how a new feature meets expectations? Don't they talk to customers who have adopted new features? They certainly do. Then, why the usability problems persist?

What follows is my observation and speculation on this matter.

Disconnection between MySQL developers and users community

My experience working with system providers has put me in contact with many users. I have seen that in most cases users are very much protective of their current deployment, because it took them long time to get it right, and they don't upgrade unless they don't have another choice. I've seen users test the newer versions, realize that they would break some of their procedures, and defer the upgrade to better times that never come. I remember last year a user with a not so large set of servers was considering an upgrade to MySQL 5.6, while 5.7 had been GA for two years. The reason was a set of incompatibilities that made the upgrade too difficult.

For companies that deal with thousands of servers, the problem is similar, but exacerbated by the quantity of servers to upgrade and the need to do it without stopping operations. This latest requirement has made some users decide not to use GTID, because it required offline time for a master, and they hadn't had time enough to test the upgrade to MySQL 5.7 that would solve that problem.

For one reason or the other, many companies upgrade only two or three years after a given version became GA. And this is the main problem: until they use it in production, or at least test the version for a projected upgrade, users can't give valuable feedback, the one that is related to usage in production, and when they do, the version for which they provide feedback has been GA for long time, and can't be changed, while the next one is already close to GA, and as such will be untouchable.

The MySQL team gets feedback on a release from a handful of curious users who don't delay testing until the new version is GA, but don't provide the kind of important feedback that get the development team attention, such as deployment in production by large customers. In many cases, large customers are the ones that upgrade several years after GA, and by then their input is difficult to act upon.

We have then a curious situation, where the warnings given by the early software explorers are confirmed years later by the users to which the MySQL team listens more closely, but by then the next version of the server is already locked in a release schedule that nobody wants to alter to fix usability issues.

How can we solve this problem? Simple: listen to early software explorers and try to fix problems before GA.

Wednesday, February 28, 2018

The confusing strategy for MySQL shell

Where the hell is it?

The MySQL shell is a potentially useful tool that has been intentionally made difficult to use properly.

It was introduced, with much fanfare, with the MySQL Document Store, as THE tool to bridge the SQL and no-SQL worlds. The release was less than satisfactory, though: MySQL 5.7.12 introduced a new feature (the X-protocol plugin) bundled with the server. The maturity of the plugin was unclear, as it popped out of the unknown into a GA release, without any public testing. It was allegedly GA quality, although the quantity of bug reports that were filed soon after the release proved otherwise. The maturity of the shell was known as "development preview", and so we had a supposedly GA feature that could only be used with an alpha quality tool.

The situation with the MySQL shell got worse in a few months. A new product was brewing (MySQL Group Replication) and went rapidly from something released in the Labs without docs to being part of the regular server distribution, and it was evolving into a more complex and ambitious project (the InnoDB Cluster) which used the MySQL shell as its main tool.

Since the announcement of InnoDB Cluster, using the MySQL shell has been a nightmare. You saw examples in blog posts and presentations, and when you tried them at home, they did not work. There were different releases of MySQL shell with the same version number but different capabilities, depending on whether they were released through the main downloads site or through the labs.

When I asked why the shell wasn't distributed with the server, like the other tools, I was told that a non-GA product could not be released with a GA server. Considering that the Document Store is still walking around with a Pre-Production status legal notice, this was an odd excuse.

Still, I kept waiting, trying to figure out how to pair a given version of MySQL shell with a given version of the server. Unlike the server, there are no release notes for the shell, so every release was a surprising experience.

Eventually, the MySQL shell reached the GA state, with which merit I can't tell. Given the obstacles in the path to its usage, I doubt it has had any serious testing from the community. Despite the state being GA, it keeps being released separately, leaving the puzzled users with the ungrateful task of determining with which server version that shell could be used safely.

With the upcoming release of MySQL 8.0, a new version of MySQL shell appeared, with a colorful prompt and new features that the GA shell doesn't have. The public perception of the tool keeps getting more confused. In the presentations given by the MySQL team we see the new shell doing wonders, while the GA shell keeps its monochromatic features. Shall I use the 8.0.x shell with a 5.7 server or should I stick with the 1.0 version?

In MySQL 8.0, the situation is still divided. Both products (the server and the shell) are, as of today, not GA yet. It would make sense to finally end the craziness and put the two things together, so that users don't have to hunt around for the right shell version. But the two products are still released separately.

How can I do stuff with MySQL shell?

So far, we have only seen the availability of the shell. What about the functionality?

I have heard that Oracle wants to convert the shell into the only tool to deal with MySQL. I can't prove it, as Oracle doesn't release its development plans to the public, but I can see the emphasis on the shell in talks and articles authored by MySQL team engineers. If this is the plan, I think it needs a lot more work.

If you try to use MySQL shell the same way as the regular "mysql" client, you get in trouble soon.

mysqlsh --user root --password=msandbox --port=5721 --host 127.0.0.1
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating a Session to 'root@127.0.0.1:5721'
Your MySQL connection id is 38
Server version: 5.7.21 MySQL Community Server (GPL)
No default schema selected; type \use  to set one.
MySQL Shell 1.0.11

Copyright (c) 2016, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help' or '\?' for help; '\quit' to exit.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.

I see two problems here:

The warning about the password on the command line is legitimate. The trouble is that there is no alternative. mysqlsh does not support --defaults-file, and there is no way of giving a password other than directly at invocation. There is an option "--passwords-from-stdin" which does not seem to work, and even if it did, I can't see the advantage of using the password from a pipe.
The default mode is Javascript. I can see that this makes operations simpler when you want to perform setup tasks for InnoDB Cluster, but certainly doesn't help me to use this tool as the primary drive for database management. There is a "--sql" option that does what I expect, but if this is not the default, I can't see this replacement being very successful.
Due to the previous items, using the tool in batch mode (with -e "SQL commands") is impossible, as every invocation will start with the freaking password warning.

I'm afraid that it's too late to take action for MySQL 8.0. The MySQL team is probably packaging the GA release while I write these notes. But I offer some suggestions nonetheless.

Wish list

Package MySQL shell with the server. Past experience shows that the MySQL team keeps adding features into a GA release, thus exposing users to the risk of getting the wrong tool for the job. Having the shell and the server in the same tarball will help users pick the right version for the task. This is similar to what happens with mysqldump: using the tool from 5.5 with a 5.7+ server will not work properly. There is no reason for mysqlsh to be treated differently.
Make sure that all the features of the mysql client work seamlessly in mysqlsh. Perhaps run the test suite replacing mysql with mysqlsh and pick up from there.
Make the MySQL shell compatible with other tools. Specifically, it should support option files (--defaults-file, --defaults-extra-file, --defaults-group-suffix, --no-defaults)

In short, if the plan is to replace mysql with mysqlsh, put the thing in the open, and please make sure it can do what users can reasonably expect.

Sunday, September 24, 2017

Revisiting roles in MySQL 8.0

In my previous article about roles I said that one of the problems with role usage is that roles need to be activated before they kick in. Let's recap briefly what the problem is:

## new session, as user `root`

mysql [localhost] {root} ((none)) > create role viewer;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > grant select on *.* to viewer;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > create user see_it_all identified by 'msandbox';
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > grant viewer to see_it_all;
Query OK, 0 rows affected (0.01 sec)

## NEW session, as user `see_it_all`

mysql [localhost] {see_it_all} ((none)) > use test
ERROR 1044 (42000): Access denied for user 'see_it_all'@'%' to database 'test'

mysql [localhost] {see_it_all} ((none)) > show grants\G
*************************** 1. row ***************************
Grants for see_it_all@%: GRANT USAGE ON *.* TO `see_it_all`@`%`
*************************** 2. row ***************************
Grants for see_it_all@%: GRANT `viewer`@`%` TO `see_it_all`@`%`
2 rows in set (0.00 sec)

mysql [localhost] {see_it_all} (test) > select current_role();
+----------------+
| current_role() |
+----------------+
| NONE           |
+----------------+
1 row in set (0.00 sec)

We can create a simple role that gives read-only access to most database objects, and assign it to a user. However, when the new user tries accessing one database, it is rejected. The problem is that the role must be activated, either permanently, or for the current session.

mysql [localhost] {see_it_all} ((none)) > set role viewer;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {see_it_all} (test) > select current_role();
+----------------+
| current_role() |
+----------------+
| `viewer`@`%`   |
+----------------+
1 row in set (0.00 sec)

mysql [localhost] {see_it_all} ((none)) > use test
Database changed

mysql [localhost] {see_it_all} (test) > show grants\G
*************************** 1. row ***************************
Grants for see_it_all@%: GRANT SELECT ON *.* TO `see_it_all`@`%`
*************************** 2. row ***************************
Grants for see_it_all@%: GRANT `viewer`@`%` TO `see_it_all`@`%`
2 rows in set (0.00 sec)

The main issue here is that the role is not active immediately. If we grant a given privilege to a user, the user will be able to operate under that privilege straight away. If we grant a role, instead, the user can't use it immediately. Roles need to be activated, either by the giver or by the receiver.

Auto activating roles

In MySQL 8.0.2 there are two new features related to roles, and one of them addresses the main problem we have just seen. When we use activate_all_roles_on_login, all roles become active when the user starts a session, regardless of any role activation that may pre-exist. Let's try. In the previous example, as root, we issue this command:

mysql [localhost] {root} ((none)) > set global activate_all_roles_on_login=ON;
Query OK, 0 rows affected (0.00 sec)

Then, we connect as user see_it_all

mysql [localhost] {see_it_all} ((none)) > select current_role();
+----------------+
| current_role() |
+----------------+
| `viewer`@`%`   |
+----------------+
1 row in set (0.00 sec)

mysql [localhost] {see_it_all} ((none)) > use test
Database changed

The role is active. The current role can be overridden temporarily using SET ROLE:

mysql [localhost] {see_it_all} ((none)) > use test
Database changed

mysql [localhost] {see_it_all} (test) > set role none;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {see_it_all} (test) > select current_role();
+----------------+
| current_role() |
+----------------+
| NONE           |
+----------------+
1 row in set (0.00 sec)

mysql [localhost] {see_it_all} (test) > show tables;
ERROR 1044 (42000): Access denied for user 'see_it_all'@'%' to database 'test'

This is a good option, which can further simplify DBAs work. There are, as usual, a few caveats:

This option has effect only on login, i.e. when the user starts a new session. Users that are already logged in when the option is changed will not be affected until they re-connect.
Use of this option can have adverse effects when using combinations of roles. If the DBA intent is to give users several roles that should be used separately, using activate_all_roles_on_login will make the paradigm more difficult to use. Let's see an example:

CREATE ROLE payroll_viewer ;
GRANT SELECT ON payroll.* TO payroll_viewer;

CREATE ROLE payroll_updater;
GRANT CREATE, INSERT, UPDATE, DELETE ON payroll.* TO payroll_updater;

CREATE ROLE personnel_viewer ;
GRANT SELECT ON personnel.* TO personnel_viewer;

CREATE ROLE personnel_updater;
GRANT CREATE, INSERT, UPDATE, DELETE ON personnel.* TO personnel_updater;

CREATE ROLE payroll;
GRANT payroll_updater, payroll_viewer, personnel_viewer to payroll;

CREATE ROLE personnel;
GRANT personnel_updater, personnel_viewer to personnel;

CREATE USER pers_user identified by 'msandbox';
CREATE USER pers_manager identified by 'msandbox';
CREATE USER pay_user identified by 'msandbox';

GRANT personnel to pers_user;
GRANT personnel, payroll_viewer to pers_manager;
GRANT payroll to pay_user;

SET DEFAULT ROLE personnel TO pers_user;
SET DEFAULT ROLE personnel TO pers_manager;
SET DEFAULT ROLE payroll TO pay_user;

In the above situation, we want the user pers_manager to see the personnel records by default, but she needs to manually activate payroll_viewer to see the payroll.
If we set activate_all_roles_on_login, pers_manager would be able to see payroll info without further action.

Mandatory roles

Another option introduced in 8.0.2 is mandatory_roles. This variable can be set with a list of roles. When set, the roles in the list will be added to the privileges of all users, including future ones.

Here's an example of how this feature could be useful. We want a schema containing data that should be accessible to all users, regardless of their privileges.

CREATE SCHEMA IF NOT EXISTS company;

DROP TABLE IF EXISTS company.news;
CREATE TABLE company.news(
    id int not null auto_increment primary key,
    news_type ENUM('INFO', 'WARNING', 'ALERT'),
    contents MEDIUMTEXT);

DROP ROLE IF EXISTS news_reader;
CREATE ROLE news_reader;
GRANT SELECT ON company.* TO news_reader;
SET PERSIST mandatory_roles = news_reader;

In this example, every user that starts a session after mandatory_roles was set will be able to access the "company" schema and read the news from there.

There are at least two side effects of this feature:

When a role is included in the list of mandatory roles, it can't be dropped.

mysql [localhost] {root} (mysql) > drop role news_reader;
ERROR 4527 (HY000): The role `news_reader`@`%` is a mandatory role and can't be revoked or dropped. 
The restriction can be lifted by excluding the role identifier from the global variable mandatory_roles.

users who have already a broad access that include the privileges in the mandatory role will nonetheless have the global role show up in the user list of grants. For example, here is how 'root'@'localhost' grants look like:

mysql [localhost] {root} ((none)) > show grants \G
*************************** 1. row ***************************
Grants for root@localhost: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD,
 SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, 
 CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, 
 CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, 
 CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `root`@`localhost` WITH GRANT 
 OPTION
*************************** 2. row ***************************
Grants for root@localhost: GRANT BACKUP_ADMIN,BINLOG_ADMIN,CONNECTION_ADMIN,
 ENCRYPTION_KEY_ADMIN,GROUP_REPLICATION_ADMIN,PERSIST_RO_VARIABLES_ADMIN, 
 REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,
 SET_USER_ID,SYSTEM_VARIABLES_ADMIN,XA_RECOVER_ADMIN ON *.* TO `root`@`localhost` 
 WITH GRANT OPTION
*************************** 3. row ***************************
Grants for root@localhost: GRANT <b>SELECT ON `company`.*</b> TO `root`@`localhost`
*************************** 4. row ***************************
Grants for root@localhost: GRANT PROXY ON ''@'' TO 'root'@'localhost' WITH GRANT OPTION
*************************** 5. row ***************************
Grants for root@localhost: GRANT <b>`news_reader`@`%`</b> TO `root`@`localhost`
5 rows in set (0.00 sec)

More gotchas

There are several commands for setting roles. One of them uses ALTER USER, while the rest uses a SET command.

First gotcha: SET ROLE and SET DEFAULT ROLE don't need an equals sign (=). The syntax is similar to SET CHARACTER SET, not to SET variable. This is a bit confusing, because another security related command (SET PASSWORD) requires the '=' in the assignment.
Now, for the really entertaining part, here's a list of commands that can give any DBA an headache.

Command	meaning
`SET ROLE` role_name	Activates the role role_name for the current session.
`SET DEFAULT ROLE` role_name	Sets the role role_name as default permanently.
`SET ROLE DEFAULT`	Activates the default role for the current session.

State of bugs

Community members have reported several bugs about roles. While I am happy of the MySQL team response concerning the usability of roles (the automatic activation came after I gave feedback) I am less thrilled by seeing that none of the public bugs reported on this matter have been addressed.Bug#85561 is particularly vexing. I reported that users can be assigned non-existing roles as default. I was answered with a sophism about the inner being of a default role, and the bug report was closed with a "Won't fix" state. I disagree with this characterisation. The behaviour that I reported is a bug because it allows users to write a wrong statement without a warning or an error. I hope the team will reconsider and take action to improve the usability of roles.

Tuesday, October 11, 2016

OTN appreciation day : MySQL 8.0 data dictionary

About one month ago, the MySQL team at Oracle released MySQL 8.0, with a large list of changes. One of the most interesting features in the new release is also one that does not show up much, also because the team has gone to great length to keep most of its implementation hidden: the data dictionary.

NewImage

What makes the data dictionary so interesting, despite its scarce visibility, is the effect that it has on performance. Up to MySQL 5.7, searching the information_schema was an onerous operation, potentially crippling the system. In MySQL 8.0, the same operations are 100 times faster. This would be reason enough to be excited, as I know many users who have had terrible problems with the inefficiency of information_schema.

But there are several other visible changes: the various files that were needed to identify database objects (.frm for tables, .trg for triggers, .par for partitions) are now gone. When the database server wants to know metadata about such objects, it doesn't have to open files anymore. All the information is stored in the hidden InnoDB tables of the data dictionary. If the hidden tables bother you, you can use a not so difficult hack to discover them, although most operations can be performed without inner knowledge of how the dictionary is organized. The idea is that the public interface (views in information_schema) should be considered reliable and used for future compatibility. Perhaps the hidden status will be lifted in one of the next releases: many in the community have given feedback in this direction.

What else is the data dictionary good for? It allows atomic DDL operations, which were not guaranteed when data for the dictionary was stored in files and MyISAM tables. A nice complement to the data dictionary is the elimination of all MyISAM tables from the system. Now the grant privileges are stored in InnoDB tables, which, probably for historical compatibility, were left visible.

I am sure users will come up with other clever usages of the data dictionary when the system is wider adopted and understood. For now, it's a wonderful toy to explore!

This post was suggested by Tim Hall, a well known community champion among Oracle users, who maintains a rich web site of news and free technical info. According to his suggestion, this post wants to add to the OTN appreciation day, a distributed community effort to show something useful, or pleasant, or both related to the Oracle world.

For those not used to the Oracle Technology Network (OTN), it is the center of Oracle technology, the place where users can get all software (proprietary or open source) and other resources related to Oracle products. In the image below you may find several familiar names.

Monday, September 26, 2016

PerconaLive Amsterdam 2016 - Talks and shows

With Oracle Open World behind us, we are now getting ready for the next big event, i.e. the European edition of PerconaLive. I am going to be a presenter three times:

MySQL operations in Docker is a three-hour tutorial, and it will be an expansion of the talk by the same title presented at OOW. Attendees who want to play along can do it, by coming prepared with Docker 1.11 or later and the following images already pulled (images with [+] are mandatory, while [-] are optional):
- mysql/mysql-server [+]
- mysql/mysql-gr [+]
- mysql [-]
- percona/percona-server [-]
- mariadb [-]
- mysql/shell [-]
- datacharmer/mysql-minimal-5.5 [-]
- datacharmer/mysql-minimal-5.6 [-]
- datacharmer/mysql-minimal-5.7 [-]
- datacharmer/mysql-minimal-8.0 [-]
- datacharmer/my-ubuntu [-]
- datacharmer/my-debian [-]
- datacharmer/my-centos [-]
The fastest customized MySQL deployment ever is a presentation where I show two ways of deploying MySQL on a custom server, with MySQL Sandbox and Docker.
The lightning talks will be held during the Community dinner at Booking.com, diamond sponsor of the conference. If you want to attend, you need to register, and if you want a free ticker for that dinner, there is still ONE SLOT for the lightning talks. Contact me directly with a title and an abstract if you want to be considered for that slot (finding my email is part of the challenge, but it should not be that difficult).

UPDATE: here's the lightning talks program. Percona should eventually add it to the schedule.

19:00: arrival at Booking.com by boat
19:15: welcoming speech and beginning of the LT
20:00 food buffet opens, LT are not finished yet
20:30: LT are done, buffet still open
21:15: buffet closes

The talks accepted are the following:

Jaime Crespo
dbhell: a tiny Python framework for the administration and monitoring of farms of MySQL servers
Federico Razzoli
How to write slow and buggy stored procedures
Art Van Scheppingen
Simple household rules that keep MySQL running
Pavel Tru
What internal statistics every self-respecting dbms should have!
Giuseppe Maxia
Fastest, smallest, newest: download and deploy MySQL at high speed
Ronald Bradford
An Awe-Inspiring Journey Through the World of Numbers

Sunday, September 25, 2016

Showing the hidden tables in MySQL 8 data dictionary

The freshly released MySQL 8.0 includes a data dictionary, which makes MySQL much more reliable. Thanks to this features, we don't have any '.frm' files, and querying the information_schema is 30x to 100x faster than previous versions.

One drawback of the implementation is that the data dictionary tables are hidden by design.

While the reason is fully understandable (they don't want to commit on an interface that may change in the future) many curious users are disappointed, because openness is the basis of good understanding and feedback.

The problem to access the dictionary tables can be split in three parts:

Finding the list of tables;
Finding the description of the tables;
Getting the contents of each table.

The first part is easily solved. We know that the data dictionary tables are accessed from some information_schema views (the views are defined during the initialization, at which point the DD tables are readable.)

For example:

 show create view information_schema.tables\G
*************************** 1. row ***************************
                View: TABLES
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost`
SQL SECURITY DEFINER VIEW `TABLES` AS select `cat`.`name` AS
`TABLE_CATALOG`,`sch`.`name` AS `TABLE_SCHEMA`,`tbl`.`name` AS
`TABLE_NAME`,`tbl`.`type` AS `TABLE_TYPE`,if((`tbl`.`type` = 'BASE
TABLE'),`tbl`.`engine`,NULL) AS `ENGINE`,if((`tbl`.`type` = 'VIEW'),NULL,10)
AS `VERSION`,`tbl`.`row_format` AS `ROW_FORMAT`,`stat`.`table_rows` AS
`TABLE_ROWS`,`stat`.`avg_row_length` AS `AVG_ROW_LENGTH`,`stat`.`data_length`
AS `DATA_LENGTH`,`stat`.`max_data_length` AS
`MAX_DATA_LENGTH`,`stat`.`index_length` AS `INDEX_LENGTH`,`stat`.`data_free`
AS `DATA_FREE`,`stat`.`auto_increment` AS `AUTO_INCREMENT`,`tbl`.`created` AS
`CREATE_TIME`,`stat`.`update_time` AS `UPDATE_TIME`,`stat`.`check_time` AS
`CHECK_TIME`,`col`.`name` AS `TABLE_COLLATION`,`stat`.`checksum` AS
`CHECKSUM`,if((`tbl`.`type` =
'VIEW'),NULL,get_dd_create_options(`tbl`.`options`,if((ifnull(`tbl`.`partition_expression`,'NOT_PART_TBL')
= 'NOT_PART_TBL'),0,1))) AS
`CREATE_OPTIONS`,internal_get_comment_or_error(`sch`.`name`,`tbl`.`name`,`tbl`.`type`,`tbl`.`options`,`tbl`.`comment`)
AS `TABLE_COMMENT` from ((((`mysql`.`tables` `tbl` join `mysql`.`schemata`
`sch` on((`tbl`.`schema_id` = `sch`.`id`))) join `mysql`.`catalogs` `cat`
on((`cat`.`id` = `sch`.`catalog_id`))) left join `mysql`.`collations` `col`
on((`tbl`.`collation_id` = `col`.`id`))) left join `mysql`.`table_stats`
`stat` on(((`tbl`.`name` = `stat`.`table_name`) and (`sch`.`name` =
`stat`.`schema_name`)))) where (can_access_table(`sch`.`name`,`tbl`.`name`)
and (not(`tbl`.`hidden`)))
character_set_client: latin1
collation_connection: latin1_swedish_ci
1 row in set (0.00 sec)

Here we see many tables (such as mysql.schemata or mysql.collations) that don't show up when we run SHOW TABLES in the mysql database.

We can use a script like this to get the list of all tables:

MYSQL=~/sandboxes/msb_full_8_0_0/use
TABLES=$($MYSQL  information_schema -BN -e 'show tables')

function show_tables
{
    for T in $TABLES
    do
        is_view=$($MYSQL information_schema -BN -e "show create table $T\G" | grep ALGORITHM)
        if [ -n "$is_view" ]
        then
            $MYSQL information_schema -e "show create table $T\G" \
               | perl -lne 'print $1 while /mysql.\..(\w+)/g'
        fi
    done
}
show_tables | sort | uniq

This script searches all information_schema tables, skips the ones that are not views, and then searches in the view definition every table from the mysql database. The result is this:

catalogs
character_sets
collations
columns
foreign_key_column_usage
foreign_keys
index_column_usage
indexes
index_stats
schemata
tables
table_stats

Good. Now we have the list of tables that we can't see. The second operation is getting the description.

So, I looked at the source code, and I found out where the prohibition originated. From there, I saw that the table is accessible when the variable skip_dd_table_access_check is set. Looking at the variables inside the server, I did not find any skip_dd_table_access_check, as I was expecting, since it would not make sense to provide this information in the open after going through the pains of making all DD tables unreachable.

I searched the code for the string skip_dd_table_access_check and I found out how it is used in the test suite. The key is using the debug build of the MySQL server.

Using MySQL Sandbox, with a sandbox made from the full tarball of MySQL 8.0, I run:

 ~/sandboxes/msb_full_8_0_0/restart --mysqld=mysqld-debug

Now I have loaded the debug-enabled server. Let's try:

$ ~/sandboxes/msb_full_8_0_0/use mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 91
Server version: 8.0.0-dmr-debug MySQL Community Server - Debug (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost] {msandbox} (mysql) > show create table schemata\G
ERROR 3554 (HY000): Access to system table 'mysql.schemata' is rejected.

Having the debug build is not enough. We need to use the magic spell.

mysql [localhost] {msandbox} (mysql) > SET SESSION debug='+d,skip_dd_table_access_check';
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {msandbox} (mysql) > show create table schemata\G
*************************** 1. row ***************************
       Table: schemata
Create Table: CREATE TABLE `schemata` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `catalog_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) COLLATE utf8_bin NOT NULL,
  `default_collation_id` bigint(20) unsigned NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `catalog_id` (`catalog_id`,`name`),
  KEY `default_collation_id` (`default_collation_id`),
  CONSTRAINT `schemata_ibfk_1` FOREIGN KEY (`catalog_id`) REFERENCES `catalogs` (`id`),
  CONSTRAINT `schemata_ibfk_2` FOREIGN KEY (`default_collation_id`) REFERENCES  `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
1 row in set (0.00 sec)

Yay! The table is now visible! We can modify the above script as follows:

mysql_tables=$(show_tables | sort | uniq)
for T in $mysql_tables
do
    echo "-- $T "
    $MYSQL -e "SET SESSION debug= '+d,skip_dd_table_access_check'; show create table mysql.$T\G"
done

And we get the description of every table in the data dictionary. Here it goes:

-- catalogs
*************************** 1. row ***************************
       Table: catalogs
Create Table: CREATE TABLE `catalogs` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(64) COLLATE utf8_bin NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- character_sets
*************************** 1. row ***************************
       Table: character_sets
Create Table: CREATE TABLE `character_sets` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `default_collation_id` bigint(20) unsigned NOT NULL,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `mb_max_length` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`),
  KEY `default_collation_id` (`default_collation_id`),
  CONSTRAINT `character_sets_ibfk_1` FOREIGN KEY (`default_collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=249 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- collations
*************************** 1. row ***************************
       Table: collations
Create Table: CREATE TABLE `collations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `character_set_id` bigint(20) unsigned NOT NULL,
  `is_compiled` tinyint(1) NOT NULL,
  `sort_length` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`),
  KEY `character_set_id` (`character_set_id`),
  CONSTRAINT `collations_ibfk_1` FOREIGN KEY (`character_set_id`) REFERENCES `character_sets` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=278 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- columns
*************************** 1. row ***************************
       Table: columns
Create Table: CREATE TABLE `columns` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `table_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 COLLATE utf8_tolower_ci NOT NULL,
  `ordinal_position` int(10) unsigned NOT NULL,
  `type` enum(/* removed */) COLLATE utf8_bin NOT NULL,
  `is_nullable` tinyint(1) NOT NULL,
  `is_zerofill` tinyint(1) DEFAULT NULL,
  `is_unsigned` tinyint(1) DEFAULT NULL,
  `char_length` int(10) unsigned DEFAULT NULL,
  `numeric_precision` int(10) unsigned DEFAULT NULL,
  `numeric_scale` int(10) unsigned DEFAULT NULL,
  `datetime_precision` int(10) unsigned DEFAULT NULL,
  `collation_id` bigint(20) unsigned DEFAULT NULL,
  `has_no_default` tinyint(1) DEFAULT NULL,
  `default_value` blob,
  `default_value_utf8` text COLLATE utf8_bin,
  `default_option` blob,
  `update_option` varchar(32) COLLATE utf8_bin DEFAULT NULL,
  `is_auto_increment` tinyint(1) DEFAULT NULL,
  `is_virtual` tinyint(1) DEFAULT NULL,
  `generation_expression` longblob,
  `generation_expression_utf8` longtext COLLATE utf8_bin,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `hidden` tinyint(1) NOT NULL,
  `options` mediumtext COLLATE utf8_bin,
  `se_private_data` mediumtext COLLATE utf8_bin,
  `column_key` enum('','PRI','UNI','MUL') COLLATE utf8_bin NOT NULL,
  `column_type_utf8` mediumtext COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `table_id` (`table_id`,`name`),
  UNIQUE KEY `table_id_2` (`table_id`,`ordinal_position`),
  KEY `collation_id` (`collation_id`),
  CONSTRAINT `columns_ibfk_1` FOREIGN KEY (`table_id`) REFERENCES `tables` (`id`),
  CONSTRAINT `columns_ibfk_2` FOREIGN KEY (`collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3450 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- foreign_key_column_usage
*************************** 1. row ***************************
       Table: foreign_key_column_usage
Create Table: CREATE TABLE `foreign_key_column_usage` (
  `foreign_key_id` bigint(20) unsigned NOT NULL,
  `ordinal_position` int(10) unsigned NOT NULL,
  `column_id` bigint(20) unsigned NOT NULL,
  `referenced_column_name` varchar(64) CHARACTER SET utf8 COLLATE utf8_tolower_ci NOT NULL,
  PRIMARY KEY (`foreign_key_id`,`ordinal_position`),
  UNIQUE KEY `foreign_key_id` (`foreign_key_id`,`column_id`,`referenced_column_name`),
  KEY `column_id` (`column_id`),
  CONSTRAINT `foreign_key_column_usage_ibfk_1` FOREIGN KEY (`foreign_key_id`) REFERENCES `foreign_keys` (`id`),
  CONSTRAINT `foreign_key_column_usage_ibfk_2` FOREIGN KEY (`column_id`) REFERENCES `columns` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- foreign_keys
*************************** 1. row ***************************
       Table: foreign_keys
Create Table: CREATE TABLE `foreign_keys` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `schema_id` bigint(20) unsigned NOT NULL,
  `table_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `unique_constraint_id` bigint(20) unsigned NOT NULL,
  `match_option` enum('NONE','PARTIAL','FULL') COLLATE utf8_bin NOT NULL,
  `update_rule` enum('NO ACTION','RESTRICT','CASCADE','SET NULL','SET DEFAULT') COLLATE utf8_bin NOT NULL,
  `delete_rule` enum('NO ACTION','RESTRICT','CASCADE','SET NULL','SET DEFAULT') COLLATE utf8_bin NOT NULL,
  `referenced_table_catalog` varchar(64) COLLATE utf8_bin NOT NULL,
  `referenced_table_schema` varchar(64) COLLATE utf8_bin NOT NULL,
  `referenced_table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `schema_id` (`schema_id`,`name`),
  UNIQUE KEY `table_id` (`table_id`,`name`),
  KEY `unique_constraint_id` (`unique_constraint_id`),
  CONSTRAINT `foreign_keys_ibfk_1` FOREIGN KEY (`schema_id`) REFERENCES `schemata` (`id`),
  CONSTRAINT `foreign_keys_ibfk_2` FOREIGN KEY (`unique_constraint_id`) REFERENCES `indexes` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=46 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- index_column_usage
*************************** 1. row ***************************
       Table: index_column_usage
Create Table: CREATE TABLE `index_column_usage` (
  `index_id` bigint(20) unsigned NOT NULL,
  `ordinal_position` int(10) unsigned NOT NULL,
  `column_id` bigint(20) unsigned NOT NULL,
  `length` int(10) unsigned DEFAULT NULL,
  `order` enum('UNDEF','ASC','DESC') COLLATE utf8_bin NOT NULL,
  `hidden` tinyint(1) NOT NULL,
  UNIQUE KEY `index_id` (`index_id`,`ordinal_position`),
  UNIQUE KEY `index_id_2` (`index_id`,`column_id`,`hidden`),
  KEY `f2` (`column_id`),
  CONSTRAINT `index_column_usage_ibfk_1` FOREIGN KEY (`index_id`) REFERENCES `indexes` (`id`),
  CONSTRAINT `index_column_usage_ibfk_2` FOREIGN KEY (`column_id`) REFERENCES `columns` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- indexes
*************************** 1. row ***************************
       Table: indexes
Create Table: CREATE TABLE `indexes` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `table_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 COLLATE utf8_tolower_ci NOT NULL,
  `type` enum('PRIMARY','UNIQUE','MULTIPLE','FULLTEXT','SPATIAL') COLLATE utf8_bin NOT NULL,
  `algorithm` enum('SE_SPECIFIC','BTREE','RTREE','HASH','FULLTEXT') COLLATE utf8_bin NOT NULL,
  `is_algorithm_explicit` tinyint(1) NOT NULL,
  `is_visible` tinyint(1) NOT NULL,
  `is_generated` tinyint(1) NOT NULL,
  `hidden` tinyint(1) NOT NULL,
  `ordinal_position` int(10) unsigned NOT NULL,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `options` mediumtext COLLATE utf8_bin,
  `se_private_data` mediumtext COLLATE utf8_bin,
  `tablespace_id` bigint(20) unsigned DEFAULT NULL,
  `engine` varchar(64) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `table_id` (`table_id`,`name`),
  KEY `tablespace_id` (`tablespace_id`),
  CONSTRAINT `indexes_ibfk_1` FOREIGN KEY (`table_id`) REFERENCES `tables` (`id`),
  CONSTRAINT `indexes_ibfk_2` FOREIGN KEY (`tablespace_id`) REFERENCES `tablespaces` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=235 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- index_stats
*************************** 1. row ***************************
       Table: index_stats
Create Table: CREATE TABLE `index_stats` (
  `schema_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `index_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `column_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `cardinality` bigint(20) unsigned DEFAULT NULL,
  UNIQUE KEY `schema_name` (`schema_name`,`table_name`,`index_name`,`column_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- schemata
*************************** 1. row ***************************
       Table: schemata
Create Table: CREATE TABLE `schemata` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `catalog_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) COLLATE utf8_bin NOT NULL,
  `default_collation_id` bigint(20) unsigned NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `catalog_id` (`catalog_id`,`name`),
  KEY `default_collation_id` (`default_collation_id`),
  CONSTRAINT `schemata_ibfk_1` FOREIGN KEY (`catalog_id`) REFERENCES `catalogs` (`id`),
  CONSTRAINT `schemata_ibfk_2` FOREIGN KEY (`default_collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- tables
*************************** 1. row ***************************
       Table: tables
Create Table: CREATE TABLE `tables` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `schema_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) COLLATE utf8_bin NOT NULL,
  `type` enum('BASE TABLE','VIEW','SYSTEM VIEW') COLLATE utf8_bin NOT NULL,
  `engine` varchar(64) CHARACTER SET utf8 NOT NULL,
  `mysql_version_id` int(10) unsigned NOT NULL,
  `row_format` enum('Fixed','Dynamic','Compressed','Redundant','Compact','Paged') COLLATE utf8_bin DEFAULT NULL,
  `collation_id` bigint(20) unsigned DEFAULT NULL,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `hidden` tinyint(1) NOT NULL,
  `options` mediumblob,
  `se_private_data` mediumtext COLLATE utf8_bin,
  `se_private_id` bigint(20) unsigned DEFAULT NULL,
  `tablespace_id` bigint(20) unsigned DEFAULT NULL,
  `partition_type` enum(/*removed*/) COLLATE utf8_bin DEFAULT NULL,
  `partition_expression` varchar(2048) COLLATE utf8_bin DEFAULT NULL,
  `default_partitioning` enum('NO','YES','NUMBER') COLLATE utf8_bin DEFAULT NULL,
  `subpartition_type` enum('HASH','KEY_51','KEY_55','LINEAR_HASH','LINEAR_KEY_51','LINEAR_KEY_55') COLLATE utf8_bin DEFAULT NULL,
  `subpartition_expression` varchar(2048) COLLATE utf8_bin DEFAULT NULL,
  `default_subpartitioning` enum('NO','YES','NUMBER') COLLATE utf8_bin DEFAULT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `view_definition` longblob,
  `view_definition_utf8` longtext COLLATE utf8_bin,
  `view_check_option` enum('NONE','LOCAL','CASCADED') COLLATE utf8_bin DEFAULT NULL,
  `view_is_updatable` enum('NO','YES') COLLATE utf8_bin DEFAULT NULL,
  `view_algorithm` enum('UNDEFINED','TEMPTABLE','MERGE') COLLATE utf8_bin DEFAULT NULL,
  `view_security_type` enum('DEFAULT','INVOKER','DEFINER') COLLATE utf8_bin DEFAULT NULL,
  `view_definer` varchar(93) COLLATE utf8_bin DEFAULT NULL,
  `view_client_collation_id` bigint(20) unsigned DEFAULT NULL,
  `view_connection_collation_id` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `schema_id` (`schema_id`,`name`),
  UNIQUE KEY `engine` (`engine`,`se_private_id`),
  KEY `engine_2` (`engine`),
  KEY `collation_id` (`collation_id`),
  KEY `tablespace_id` (`tablespace_id`),
  CONSTRAINT `tables_ibfk_1` FOREIGN KEY (`schema_id`) REFERENCES `schemata` (`id`),
  CONSTRAINT `tables_ibfk_2` FOREIGN KEY (`collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `tables_ibfk_3` FOREIGN KEY (`tablespace_id`) REFERENCES `tablespaces` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=322 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- table_stats
*************************** 1. row ***************************
       Table: table_stats
Create Table: CREATE TABLE `table_stats` (
  `schema_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_rows` bigint(20) unsigned DEFAULT NULL,
  `avg_row_length` bigint(20) unsigned DEFAULT NULL,
  `data_length` bigint(20) unsigned DEFAULT NULL,
  `max_data_length` bigint(20) unsigned DEFAULT NULL,
  `index_length` bigint(20) unsigned DEFAULT NULL,
  `data_free` bigint(20) unsigned DEFAULT NULL,
  `auto_increment` bigint(20) unsigned DEFAULT NULL,
  `checksum` bigint(20) unsigned DEFAULT NULL,
  `update_time` timestamp NULL DEFAULT NULL,
  `check_time` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`schema_name`,`table_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC

In addition to the tables referred in views, there are three that are mentioned in the documentation but not implemented as a view in information_schema: triggers, events, and routines.


show create table triggers\G
*************************** 1. row ***************************
       Table: triggers
Create Table: CREATE TABLE `triggers` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `schema_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `event_type` enum('INSERT','UPDATE','DELETE') COLLATE utf8_bin NOT NULL,
  `table_id` bigint(20) unsigned NOT NULL,
  `action_timing` enum('BEFORE','AFTER') COLLATE utf8_bin NOT NULL,
  `action_order` int(10) unsigned NOT NULL,
  `action_statement` longblob NOT NULL,
  `action_statement_utf8` longtext COLLATE utf8_bin NOT NULL,
  `created` timestamp(2) NOT NULL DEFAULT CURRENT_TIMESTAMP(2) ON UPDATE CURRENT_TIMESTAMP(2),
  `last_altered` timestamp(2) NOT NULL DEFAULT CURRENT_TIMESTAMP(2),
  `sql_mode` set(/*removed*/) COLLATE utf8_bin NOT NULL,
  `definer` varchar(93) COLLATE utf8_bin NOT NULL,
  `client_collation_id` bigint(20) unsigned NOT NULL,
  `connection_collation_id` bigint(20) unsigned NOT NULL,
  `schema_collation_id` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `schema_id` (`schema_id`,`name`),
  UNIQUE KEY `table_id` (`table_id`,`event_type`,`action_timing`,`action_order`),
  KEY `client_collation_id` (`client_collation_id`),
  KEY `connection_collation_id` (`connection_collation_id`),
  KEY `schema_collation_id` (`schema_collation_id`),
  CONSTRAINT `triggers_ibfk_1` FOREIGN KEY (`schema_id`) REFERENCES `schemata` (`id`),
  CONSTRAINT `triggers_ibfk_2` FOREIGN KEY (`table_id`) REFERENCES `tables` (`id`),
  CONSTRAINT `triggers_ibfk_3` FOREIGN KEY (`client_collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `triggers_ibfk_4` FOREIGN KEY (`connection_collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `triggers_ibfk_5` FOREIGN KEY (`schema_collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
show create table events\G
*************************** 1. row ***************************
       Table: events
Create Table: CREATE TABLE `events` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `schema_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `definer` varchar(93) COLLATE utf8_bin NOT NULL,
  `time_zone` varchar(64) COLLATE utf8_bin NOT NULL,
  `definition` longblob NOT NULL,
  `definition_utf8` longtext COLLATE utf8_bin NOT NULL,
  `execute_at` datetime DEFAULT NULL,
  `interval_value` int(11) DEFAULT NULL,
  `interval_field` enum('YEAR','QUARTER','MONTH','DAY','HOUR','MINUTE','WEEK','SECOND','MICROSECOND','YEAR_MONTH','DAY_HOUR','DAY_MINUTE','DAY_SECOND','HOUR_MINUTE','HOUR_SECOND','MINUTE_SECOND','DAY_MICROSECOND','HOUR_MICROSECOND','MINUTE_MICROSECOND','SECOND_MICROSECOND') COLLATE utf8_bin DEFAULT NULL,
  `sql_mode` set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','NOT_USED','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALUE_ON_ZERO','NO_BACKSLASH_ESCAPES','STRICT_TRANS_TABLES','STRICT_ALL_TABLES','NO_ZERO_IN_DATE','NO_ZERO_DATE','INVALID_DATES','ERROR_FOR_DIVISION_BY_ZERO','TRADITIONAL','NO_AUTO_CREATE_USER','HIGH_NOT_PRECEDENCE','NO_ENGINE_SUBSTITUTION','PAD_CHAR_TO_FULL_LENGTH') COLLATE utf8_bin NOT NULL,
  `starts` datetime DEFAULT NULL,
  `ends` datetime DEFAULT NULL,
  `status` enum('ENABLED','DISABLED','SLAVESIDE_DISABLED') COLLATE utf8_bin NOT NULL,
  `on_completion` enum('DROP','PRESERVE') COLLATE utf8_bin NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `last_executed` datetime DEFAULT NULL,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `originator` int(10) unsigned NOT NULL,
  `client_collation_id` bigint(20) unsigned NOT NULL,
  `connection_collation_id` bigint(20) unsigned NOT NULL,
  `schema_collation_id` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `schema_id` (`schema_id`,`name`),
  KEY `client_collation_id` (`client_collation_id`),
  KEY `connection_collation_id` (`connection_collation_id`),
  KEY `schema_collation_id` (`schema_collation_id`),
  CONSTRAINT `events_ibfk_1` FOREIGN KEY (`schema_id`) REFERENCES `schemata` (`id`),
  CONSTRAINT `events_ibfk_2` FOREIGN KEY (`client_collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `events_ibfk_3` FOREIGN KEY (`connection_collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `events_ibfk_4` FOREIGN KEY (`schema_collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC

show create table routines\G
*************************** 1. row ***************************
       Table: ROUTINES
Create Table: CREATE TEMPORARY TABLE `ROUTINES` (
  `SPECIFIC_NAME` varchar(64) NOT NULL DEFAULT '',
  `ROUTINE_CATALOG` varchar(512) NOT NULL DEFAULT '',
  `ROUTINE_SCHEMA` varchar(64) NOT NULL DEFAULT '',
  `ROUTINE_NAME` varchar(64) NOT NULL DEFAULT '',
  `ROUTINE_TYPE` varchar(9) NOT NULL DEFAULT '',
  `DATA_TYPE` varchar(64) NOT NULL DEFAULT '',
  `CHARACTER_MAXIMUM_LENGTH` int(21) DEFAULT NULL,
  `CHARACTER_OCTET_LENGTH` int(21) DEFAULT NULL,
  `NUMERIC_PRECISION` bigint(21) unsigned DEFAULT NULL,
  `NUMERIC_SCALE` int(21) DEFAULT NULL,
  `DATETIME_PRECISION` bigint(21) unsigned DEFAULT NULL,
  `CHARACTER_SET_NAME` varchar(64) DEFAULT NULL,
  `COLLATION_NAME` varchar(64) DEFAULT NULL,
  `DTD_IDENTIFIER` longtext,
  `ROUTINE_BODY` varchar(8) NOT NULL DEFAULT '',
  `ROUTINE_DEFINITION` longtext,
  `EXTERNAL_NAME` varchar(64) DEFAULT NULL,
  `EXTERNAL_LANGUAGE` varchar(64) DEFAULT NULL,
  `PARAMETER_STYLE` varchar(8) NOT NULL DEFAULT '',
  `IS_DETERMINISTIC` varchar(3) NOT NULL DEFAULT '',
  `SQL_DATA_ACCESS` varchar(64) NOT NULL DEFAULT '',
  `SQL_PATH` varchar(64) DEFAULT NULL,
  `SECURITY_TYPE` varchar(7) NOT NULL DEFAULT '',
  `CREATED` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `LAST_ALTERED` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `SQL_MODE` varchar(8192) NOT NULL DEFAULT '',
  `ROUTINE_COMMENT` longtext NOT NULL,
  `DEFINER` varchar(93) NOT NULL DEFAULT '',
  `CHARACTER_SET_CLIENT` varchar(32) NOT NULL DEFAULT '',
  `COLLATION_CONNECTION` varchar(32) NOT NULL DEFAULT '',
  `DATABASE_COLLATION` varchar(32) NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Looking at the code again, I see that there are also tables tablespaces and version:

show create table tablespaces\G
*************************** 1. row ***************************
       Table: tablespaces
Create Table: CREATE TABLE `tablespaces` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) COLLATE utf8_bin NOT NULL,
  `options` mediumtext COLLATE utf8_bin,
  `se_private_data` mediumtext COLLATE utf8_bin,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `engine` varchar(64) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC

show create table version\G
*************************** 1. row ***************************
       Table: version
Create Table: CREATE TABLE `version` (
  `version` int(10) unsigned NOT NULL,
  PRIMARY KEY (`version`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC

Now we can try the last part of our task, i.e., querying the data directory for some specific info.

mysql [localhost] {msandbox} (mysql) > SET SESSION debug='+d,skip_dd_table_access_check';
mysql [localhost] {msandbox} (mysql) > select * from version;
+---------+
| version |
+---------+
|       1 |
+---------+

mysql [localhost] {msandbox} (mysql) > select id, name from schemata;
+----+--------------------+
| id | name               |
+----+--------------------+
|  2 | information_schema |
|  1 | mysql              |
|  3 | performance_schema |
|  4 | sys                |
|  5 | test               |
+----+--------------------+


mysql [localhost] {msandbox} (mysql) > select id, name, type, engine, mysql_version_id, comment from tables where name = 'user' and schema_id=1;
+----+------+------------+--------+------------------+-----------------------------+
| id | name | type       | engine | mysql_version_id | comment                     |
+----+------+------------+--------+------------------+-----------------------------+
| 84 | user | BASE TABLE | InnoDB |            80000 | Users and global privileges |
+----+------+------------+--------+------------------+-----------------------------+

Now the data dictionary is much more readable!

DISCLAIMER: there may be a simpler or more elegant solution to this problem. The method shown here is what I got by researching. But in fact, if there is a better method, short of recompiling the server, I'd like to know.

WARNING: Don't do what I do in the following paragraphs!

To complete the experiment, I am going to do what the MySQL team does not want me to do at all.

First, I create a directory inside the data directory. As shown in data dictionary limitations, this is not supported. But since we can access the data dictionary ...


$ mkdir ~/sandboxes/msb_full_8_0_0/data/db1

Now for the felony part:

mysql [localhost] {msandbox} (mysql) > SET SESSION debug= '+d,skip_dd_table_access_check';
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {msandbox} (mysql) > select * from schemata;
+----+------------+--------------------+----------------------+---------------------+---------------------+
| id | catalog_id | name               | default_collation_id | created             | last_altered        |
+----+------------+--------------------+----------------------+---------------------+---------------------+
|  1 |          1 | mysql              |                    8 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
|  2 |          1 | information_schema |                   33 | 2016-09-25 18:06:00 | 2016-09-25 18:06:00 |
|  3 |          1 | performance_schema |                   33 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
|  4 |          1 | sys                |                   33 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
|  5 |          1 | test               |                    8 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
+----+------------+--------------------+----------------------+---------------------+---------------------+
5 rows in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > insert into schemata values (6, 1, 'db1', 8, now(), now());
Query OK, 1 row affected (0.01 sec)

mysql [localhost] {msandbox} (mysql) > show schemas;
+--------------------+
| Database           |
+--------------------+
| db1                | ## TA-DA!
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+

Now, pay attention! This why the MySQL team don't want anyone to mess up with the data dictionary tables.

DOUBLE WARNING! Don't do the following!

mysql [localhost] {msandbox} (mysql) > insert into schemata values (7, 1, 'db2', 8, now(), now());
Query OK, 1 row affected (0.01 sec)

mysql [localhost] {msandbox} (mysql) > show schemas;
+--------------------+
| Database           |
+--------------------+
| db1                |
| db2                |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+
7 rows in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > use db2
ERROR 1049 (42000): Unknown database 'db2'

There! I broke the system. Lesson learned: read, don't write data dictionary tables.