Wednesday, February 21, 2018

Meet dbdeployer: the new sandbox maker


How it happened


A few years ago I started thinking about refactoring MySQL-Sandbox. I got lots of ideas and a name for the project (dbdeployer) but went no further. The initial idea (this was 2013!) was to rewrite the project in Ruby: I had been using Ruby at work and it looked like a decent replacement for Perl. My main problem was the difficulty of installation in an uncontrolled environment. If you have control over your environment (it's your laptop or you are in charge of the server configuration via Puppet or similar) then the task is easy. But if you ever need to deploy somewhere with little or no notice, it becomes a problem: there are servers where Perl is not installed, and is common that the server also have a policy forbidding all scripting languages from being deployed. Soon I found out that Ruby has the same problem as Perl. In the meantime, my work also required heavy involvement with Python, and I started thinking that maybe it would be a better choice than Ruby.
My adventures with deployment continued. In some places, I would find old versions of Perl, Ruby, Python, and no way of replacing them easily. I also realized that, if I bit the bullet and wrote my tools in C or C++, my distribution problems would not end, as I had to deal with library dependencies and conflict with existing ones.
At the end of 2017 I finally did what I had postponed for so long: I took a serious look at Go, and I decided that it was the best candidate for solving the distribution problem. I had a few adjustment problems, as the Go philosophy is different from my previously used languages, but the advantages were so immediate that I was hooked. Here's what I found compelling:

  • Shift in responsibility: with all the other languages I have used, the user is responsible for providing the working environment, such as installing libraries, the language itself, solve conflicts, and so on, until the program can work. With Go, the responsibility is on the developers only: they are supposed to know how to collect the necessary packages and produce a sound executable. Users only need to download the executable and run it.
  • Ease of deployment. A Go executable doesn't have dependencies. Binaries can be compiled for several platforms from a single origin (I can build Linux executables in my Mac and vice versa) and they just work.
  • Ease of development. Go is a strongly typed language, and has a different approach at code structure than Perl or Python. But this doesn't slow down my coding: it forces me to write better code, resulting in something that is at the same time more robust and easy to extend.
  • Wealth of packages. Go has an amazingly active community, and there is an enormous amount of packages ready for anything.

What is dbdeployer?


UPDATE 28-Feb-2018: The commands "single", "replication", and "multiple" are now subcommand of "deploy". Also, "templates" is now a subcommand of "defaults".

The first goal of dbdeployer is to replace MySQL-Sandbox completely. As such, it has all the main features of MySQL Sandbox, and many more (See the full list of features at the end of this text.)

You can deploy a single sandbox, or multiple unrelated sandboxes, or several servers in replication. That you could do also with MySQL-Sandbox. The first difference is in the command structure:

$ dbdeployer
dbdeployer makes MySQL server installation an easy task.
Runs single, multiple, and replicated sandboxes.

Usage:
  dbdeployer [command]

Available Commands:
  admin       administrative tasks
  delete      delete an installed sandbox
  global      Runs a given command in every sandbox
  help        Help about any command
  multiple    create multiple sandbox
  replication create replication sandbox
  sandboxes   List installed sandboxes
  single      deploys a single sandbox
  templates   Admin operations on templates
  unpack      unpack a tarball into the binary directory
  usage       Shows usage of installed sandboxes
  versions    List available versions

Flags:
      --base-port int                 Overrides default base-port (for multiple sandboxes)
      --bind-address string           defines the database bind-address  (default "127.0.0.1")
      --config string                 configuration file (default "$HOME/.dbdeployer/config.json")
      --custom-mysqld string          Uses an alternative mysqld (must be in the same directory as regular mysqld)
  -p, --db-password string            database password (default "msandbox")
  -u, --db-user string                database user (default "msandbox")
      --expose-dd-tables              In MySQL 8.0+ shows data dictionary tables
      --force                         If a destination sandbox already exists, it will be overwritten
      --gtid                          enables GTID
  -h, --help                          help for dbdeployer
  -i, --init-options strings          mysqld options to run during initialization
      --keep-auth-plugin              in 8.0.4+, does not change the auth plugin
      --keep-server-uuid              Does not change the server UUID
      --my-cnf-file string            Alternative source file for my.sandbox.cnf
  -c, --my-cnf-options strings        mysqld options to add to my.sandbox.cnf
      --port int                      Overrides default port
      --post-grants-sql strings       SQL queries to run after loading grants
      --post-grants-sql-file string   SQL file to run after loading grants
      --pre-grants-sql strings        SQL queries to run before loading grants
      --pre-grants-sql-file string    SQL file to run before loading grants
      --remote-access string          defines the database access  (default "127.%")
      --rpl-password string           replication password (default "rsandbox")
      --rpl-user string               replication user (default "rsandbox")
      --sandbox-binary string         Binary repository (default "$HOME/opt/mysql")
      --sandbox-directory string      Changes the default sandbox directory
      --sandbox-home string           Sandbox deployment direcory (default "$HOME/sandboxes")
      --skip-load-grants              Does not load the grants
      --use-template strings          [template_name:file_name] Replace existing template with one from file
      --version                       version for dbdeployer

Use "dbdeployer [command] --help" for more information about a command.

MySQL-Sandbox was created in 2006, and its structure changed as needed, without a real plan. dbdeployer, instead, was designed to have a hierarchical command structure, similar to git or docker, to give users a better feeling. As a result, it has a leaner set of commands, a non-awkward way of using options, and offers a better control of the operations out of the box.

For example, here's how we would start to run sandboxes:

$ dbdeployer --unpack-version=8.0.4 unpack mysql-8.0.4-rc-linux-glibc2.12-x86_64.tar.gz
Unpacking tarball mysql-8.0.4-rc-linux-glibc2.12-x86_64.tar.gz to $HOME/opt/mysql/8.0.4
.........100.........200.........292

The first (mandatory) operation is to expand binaries from a tarball. By default, the files will be expanded to $HOME/opt/mysql. Once this is done, we can create sandboxes at will, with simple commands:

$ dbdeployer single 8.0.4
Database installed in $HOME/sandboxes/msb_8_0_4
run 'dbdeployer usage single' for basic instructions'
. sandbox server started

$ dbdeployer replication 8.0.4
[...]
Replication directory installed in /$HOME/sandboxes/rsandbox_8_0_4
run 'dbdeployer usage multiple' for basic instructions'

$ dbdeployer multiple 8.0.4
[...]
Multiple directory installed in $HOME/sandboxes/multi_msb_8_0_4
run 'dbdeployer usage multiple' for basic instructions'

$ dbdeployer sandboxes
msb_8_0_4            : single                    8.0.4 [8004]
multi_msb_8_0_4      : multiple                  8.0.4 [24406 24407 24408]
rsandbox_8_0_4       : master-slave              8.0.4 [19405 19406 19407]

Three differences between dbdeployer and MySQL-Sandbox:

  • There is only one executable, with different commands;
  • After each deployment, there is a suggestion on how to get help about the sandbox usage.
  • There is a command that displays which sandboxes were installed, the kind of deployment, and the ports in use. This will be useful when the ports increase, as in group replication.

Here's another take, after deploying group replication:

$ dbdeployer sandboxes
group_msb_8_0_4      : group-multi-primary   8.0.4 [20405 20530 20406 20531 20407 20532]
group_sp_msb_8_0_4   : group-single-primary  8.0.4 [21405 21530 21406 21531 21407 21532]
msb_8_0_4            : single                8.0.4 [8004]
multi_msb_8_0_4      : multiple              8.0.4 [24406 24407 24408]
rsandbox_8_0_4       : master-slave          8.0.4 [19405 19406 19407]

A few more differences from MySQL-Sandbox are the "global" and "delete" commands.
The "global" command can broadcast a command to all the sandboxes. You can start, stop, restart all sandboxes at once, or run a query everywhere.

$ dbdeployer global use "select @@server_id, @@port, @@server_uuid"
# Running "use_all" on group_msb_8_0_4
# server: 1
@@server_id @@port  @@server_uuid
100 20405   00020405-1111-1111-1111-111111111111
# server: 2
@@server_id @@port  @@server_uuid
200 20406   00020406-2222-2222-2222-222222222222
# server: 3
@@server_id @@port  @@server_uuid
300 20407   00020407-3333-3333-3333-333333333333

# Running "use_all" on group_sp_msb_8_0_4
# server: 1
@@server_id @@port  @@server_uuid
100 21405   00021405-1111-1111-1111-111111111111
# server: 2
@@server_id @@port  @@server_uuid
200 21406   00021406-2222-2222-2222-222222222222
# server: 3
@@server_id @@port  @@server_uuid
300 21407   00021407-3333-3333-3333-333333333333

# Running "use" on msb_8_0_4
@@server_id @@port  @@server_uuid
1   8004    00008004-0000-0000-0000-000000008004
[...]

You can run the commands manually. dbdeployer usage will show which commands are available for every sandbox.

$ dbdeployer usage single

    USING A SANDBOX

Change directory to the newly created one (default: $SANDBOX_HOME/msb_VERSION
for single sandboxes)
[ $SANDBOX_HOME = $HOME/sandboxes unless modified with flag --sandbox-home ]

The sandbox directory of the instance you just created contains some handy
scripts to manage your server easily and in isolation.

"./start", "./status", "./restart", and "./stop" do what their name suggests.
start and restart accept parameters that are eventually passed to the server.
e.g.:

  ./start --server-id=1001

  ./restart --event-scheduler=disabled

"./use" calls the command line client with the appropriate parameters,
Example:

    ./use -BN -e "select @@server_id"
    ./use -u root

"./clear" stops the server and removes everything from the data directory,
letting you ready to start from scratch. (Warning! It's irreversible!)

When you don't need the sandboxes anymore, you can dismiss them with a single command:

$ dbdeployer delete ALL
Deleting the following sandboxes
$HOME/sandboxes/group_msb_8_0_4
$HOME/sandboxes/group_sp_msb_8_0_4
$HOME/sandboxes/msb_8_0_4
$HOME/sandboxes/multi_msb_8_0_4
$HOME/sandboxes/rsandbox_8_0_4
Do you confirm? y/[N]

There is an option to skip the confirmation, which is useful for scripting unattended tests.


Customization


One of the biggest problems with MySQL-Sandbox was that most of the functioning is hard-coded, and the scripts needed to run the sandboxes are generated in different places, so that extending or modifying features became more and more difficult. When I designed dbdeployer, I gave myself the goal of making the tool easy to change, and the code easy to understand and extend.

For this reason, I organized everything related to code generation (the scripts that initialize and run the sandboxes) in a collection of templates and default variables that are publicly visible and modifiable.

$ dbdeployer templates -h
The commands in this section show the templates used
to create and manipulate sandboxes.

Usage:
  dbdeployer templates [command]

Aliases:
  templates, template, tmpl, templ

Available Commands:
  describe    Describe a given template
  export      Exports all templates to a directory
  import      imports all templates from a directory
  list        list available templates
  reset       Removes all template files
  show        Show a given template

You can list the templates on the screen.

$ dbdeployer templates list single
    [single]      replication_options       : Replication options for my.cnf
    [single]      load_grants_template      : Loads the grants defined for the sandbox
    [single]      grants_template57         : Grants for sandboxes from 5.7+
    [single]      grants_template5x         : Grants for sandboxes up to 5.6
    [single]      my_template               : Prefix script to run every my* command line tool
    [single]      show_binlog_template      : Shows a binlog for a single sandbox
    [single]      use_template              : Invokes the MySQL client with the appropriate options
    [single]      clear_template            : Remove all data from a single sandbox
    [single]      restart_template          : Restarts the database (with optional mysqld arguments)
    [single]      start_template            : starts the database in a single sandbox (with optional mysqld arguments)
    [single]      stop_template             : Stops a database in a single sandbox
    [single]      send_kill_template        : Sends a kill signal to the database
    [single]      show_relaylog_template    : Show the relaylog for a single sandbox
    [single]      Copyright                 : Copyright for every sandbox script
    [single]      expose_dd_tables          : Commands needed to enable data dictionary table usage
    [single]      init_db_template          : Initialization template for the database
    [single]      grants_template8x         : Grants for sandboxes from 8.0+
    [single]      add_option_template       : Adds options to the my.sandbox.cnf file and restarts
    [single]      test_sb_template          : Tests basic sandbox functionality
    [single]      sb_include_template       : TBD
    [single]      gtid_options              : GTID options for my.cnf
    [single]      my_cnf_template           : Default options file for a sandbox
    [single]      status_template           : Shows the status of a single sandbox

Then it's possible to examine template contents:

$ dbdeployer templates describe --with-contents init_db_template
# Collection    : single
# Name          : init_db_template
# Description   : Initialization template for the database
# Notes         : This should normally run only once
# Length        : 656
##START init_db_template
#!/bin/bash
        {{.Copyright}}
        # Generated by dbdeployer {{.AppVersion}} using {{.TemplateName}} on {{.DateTime}}
        BASEDIR={{.Basedir}}
        export LD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$LD_LIBRARY_PATH
        export DYLD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$DYLD_LIBRARY_PATH
        SBDIR={{.SandboxDir}}
        DATADIR=$SBDIR/data
        cd $SBDIR
        if [ -d $DATADIR/mysql ]
        then
            echo "Initialization already done."
            echo "This script should run only once."
            exit 0
        fi

        {{.InitScript}} \
            {{.InitDefaults}} \
            --user={{.OsUser}} \
            --basedir=$BASEDIR \
            --datadir=$DATADIR \
            --tmpdir={{.Tmpdir}} {{.ExtraInitFlags}}

##END init_db_template

The one above is the template that generates the initialization script. In MySQL-Sandbox, this was handled in the code, and it was difficult to figure out what went wrong when the initialization failed. The Go language has an excellent support for code generation using templates, and with just a fraction of its features I implemented a few dozen scripts which I am able to modify with ease. Here's what the deployed script looks like

#!/bin/bash

#    DBDeployer - The MySQL Sandbox
#    Copyright (C) 2006-2018 Giuseppe Maxia
#
#    Licensed under the Apache License, Version 2.0 (the "License");
#    you may not use this file except in compliance with the License.
#    You may obtain a copy of the License at
#
#        http://www.apache.org/licenses/LICENSE-2.0
#
#    Unless required by applicable law or agreed to in writing, software
#    distributed under the License is distributed on an "AS IS" BASIS,
#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#    See the License for the specific language governing permissions and
#    limitations under the License.

# Generated by dbdeployer 0.1.24 using init_db_template on Tue Feb 20 14:45:29 CET 2018
BASEDIR=$HOME/opt/mysql/8.0.4
export LD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$DYLD_LIBRARY_PATH
SBDIR=$HOME/sandboxes/msb_8_0_4
DATADIR=$SBDIR/data
cd $SBDIR
if [ -d $DATADIR/mysql ]
then
 echo "Initialization already done."
 echo "This script should run only once."
 exit 0
fi

$HOME/opt/mysql/8.0.4/bin/mysqld \
    --no-defaults \
    --user=$USER \
    --basedir=$BASEDIR \
    --datadir=$DATADIR \
    --tmpdir=$HOME/sandboxes/msb_8_0_4/tmp \
    --initialize-insecure --default_authentication_plugin=mysql_native_password

Let's see the quick-and-dirty usage. If you want to change a template and use it just once, do the following:

  1. $ dbdeployer templates show init_db_template
  2. Save it to a file init_db.txt and edit it. Be careful, though: removing or altering essential labels may block the sandbox initialization.
  3. Use the template file in the next command:

$ dbdeployer single 8.0.4 --use-template=init_db_template:init_db.txt

For more permanent results, when you'd like to change a template, or several ones, permanently, you can use the export/import commands


  1. List the templates related to replication (dbdeployer templates list replication)
  2. Export the templates to the directory "mydir" $ dbdeployer templates export replication mydir
  3. edit the templates you want to change inside "mydir/replication"
  4. Import the templates dbdeployer templates import replication mydir

The templates will end inside $HOME/.dbdeployer/templates_$DBDEPLOYER_VERSION and dbdeployer will load then instead of using the ones stored internally. The next time that one of those templates will be needed, it will be collected from the file. If you run dbdeployer templates list or describe, the ones saved to file will be marked with {F}.
To go back to the built-in behavior, simply run dbdeployer templates reset

In addition to templates, dbdeployer uses a set of values when creating sandboxes. Like templates, this set is used from internal store, but it can be exported to a configuration file.

$ dbdeployer admin show
# Internal values:
{
  "version": "0.1.24",
  "sandbox-home": "$HOME/sandboxes",
  "sandbox-binary": "$HOME/opt/mysql",
  "master-slave-base-port": 11000,
  "group-replication-base-port": 12000,
  "group-replication-sp-base-port": 13000,
  "multiple-base-port": 16000,
  "group-port-delta": 125,
  "sandbox-prefix": "msb_",
  "master-slave-prefix": "rsandbox_",
  "group-prefix": "group_msb_",
  "group-sp-prefix": "group_sp_msb_",
  "multiple-prefix": "multi_msb_"
}

The values named *-base-port are used to calculate the port for each node in a multiple deployment. The calculation goes:

sandbox_port + base_port + (revision_number * 100)

So, for example, when deploying replication for 5.7.21, the sandbox port would be 5721, and the final base port will be calculated as follows:

5721 + 11000 + 21 * 100 = 18821

This number will be incremented for each node in the cluster, so that the master will get 18822, and the first slave 18823.

Using the commands dbdeployer admin export and import you can customize the default values in a way similar to what we saw for the templates.


Thanks


I'd like to thank:


A note about unpacking MySQL tarball

When using MySQL tarballs, we may have some problems due to the enormous size that the tarballs have reached. Look at this:

690M    5.5.52
1.2G    5.6.39
2.5G    5.7.21
3.6G    8.0.0
1.3G    8.0.1
1.5G    8.0.2
1.9G    8.0.3
1.9G    8.0.4

This becomes a serious problem when you want to unpack the tarball inside a low-resource virtual machine or a Docker container. I have asked the MySQL team to provide reduced tarballs, possibly in a fixed location, so that sandboxes creation could be fully automated. I was told that something will be done soon. In the meantime, I provide such reduced tarballs, which have a more reasonable size:

 49M    5.5.52
 61M    5.6.39
346M    5.7.21
447M    8.0.0
462M    8.0.1
254M    8.0.2
270M    8.0.3
244M    8.0.4

Using these reduced tarballs, which are conveniently packed in a docker container (datacharmer/mysql-sb-full contains all major MySQL versions), I have automated dbdeployer tests with minimal storage involvement, and that improves the test speed as well.

Detailed list of features


Feature MySQL-Sandbox dbdeployer dbdeployer planned
Single sandbox deployment yes yes
unpack command sort of 1 yes
multiple sandboxes yes yes
master-slave replication yes yes
"force" flag yes yes
pre-post grants SQL action yes yes
initialization options yes yes
my.cnf options yes yes
custom my.cnf yes yes
friendly UUID generation yes yes
global commands yes yes
test replication flow yes yes
delete command yes 2 yes
group replication SP no yes
group replication MP no yes
prevent port collision no yes 3
visible initialization no yes 4
visible script templates no yes 5
replaceable templates no yes 6
configurable defaults no yes 7
list of source binaries no yes 8
list of installed sandboxes no yes 9
test script per sandbox no yes 10
integrated usage help no yes 11
custom abbreviations no yes 12
version flag no yes 13
fan-in no no yes 14
all-masters no no yes 15
Galera/PXC/NDB no no yes 18
finding free ports yes no yes
pre-post grants shell action yes no maybe
getting remote tarballs yes no yes
circular replication yes no no 16
master-master (circular) yes no no
Windows support no no no 17


  1. It's achieved using --export_binaries and then abandoning the operation. 
  2. Uses the sbtool command 
  3. dbdeployer sandboxes store their ports in a description JSON file, which allows the tool to get a list of used ports and act before a conflict happens. 
  4. The initialization happens with a script that is generated and stored in the sandbox itself. Users can inspect the init_db script and see what was executed. 
  5. All sandbox scripts are generated using templates, which can be examined and eventually changed and re-imported. 
  6. See also note 5. Using the flag --use-template you can replace an existing template on-the-fly. Group of templates can be exported and imported after editing. 
  7. Defaults can be exported to file, and eventually re-imported after editing.  
  8. This is little more than using an O.S. file listing, with the added awareness of the source directory. 
  9. Using the description files, this command lists the sandboxes with their topology and used ports. 
  10. It's a basic test that checks whether the sandbox is running and is using the expected port. 
  11. The "usage" command will show basic commands for single and multiple sandboxes. 
  12. The abbreviations file allows user to define custom shortcuts for frequently used commands. 
  13. Strangely enough, this simple feature was never implemented for MySQL-Sandbox, while it was one of the first additions to dbdeployer. 
  14. Will use the multi source technology introduced in MySQL 5.7. 
  15. Same as n. 13. 
  16. Circular replication should not be used anymore. There are enough good alternatives (multi-source, group replication) to avoid this old technology. 
  17. I don't do Windows, but you can fork the project if you do. 
  18. For Galera/PXC and MySQL Cluster I have ideas, but I may need help to implement. 

2 comments:

Anonymous said...

Can you please add to the comparison chart your plans for Galera/XtraDB cluster?

Giuseppe Maxia said...

@Shlomi,
Done. I have some fuzzy ideas about PXC/Galera and MySQL Cluster. I need some help from somebody expert in the matter to make it work. So far, I am not even sure it can be done at all.