Friday, March 08, 2019

dbdeployer community - Part 2: Percona XtraDB Cluster

This was not on the radar. I have never been proficient in Galera clusters and related technologies, and thus I hadn’t given much thought to Percona Xtradb Cluster (PXC), until Alkin approached me at FOSDEM, and proposed to extend dbdeployer features to support PXC. He mentioned that many support engineers at Percona use dbdeployer) on a daily basis and that the addition of PXC would be welcome.

I could not follow up much during the conference, but we agreed on making a proof-of-concept in an indirect way: if several nodes of PXC can run in the same host using shell scripts, dbdeployer could reproduce that behavior.

A few weeks later, when dbdeployer had already been enhanced with flavors and capabilities, I got the script that can deploy several nodes in the same host. It’s a simplification of the ones used in Percona PXC tests, which got me started.

I followed a method similar to the one I used for MySQL Group replication. The technology is similar, although the MySQL Team used a different approach for the installation. The basic principle is that the cluster needs two ports per node: in addition to the regular MySQL port, there is a communication port (SST or Snapshot State Transfer port) that is needed to exchange cluster data. Using this information, and following the sample in the script, I could produce a prototype that surprisingly worked at the first try!

The cluster did deploy, and the replication test, which comes free of charge when you implement a replication-type sandbox using standard templates, worked flawlessly.

Then I hooked the deployment method into dbdeployer concurrency engine, which is able to deploy several nodes at once. Here I hit the first problem. In PXC, the nodes are not equal at startup. The first node needs to be initialised without other nodes addresses, and it becomes the reference for other nodes to join the cluster. If I provided complete references for all nodes (as I do for MySQL Group Replication,) it didn’t work.

After some talk with Percona engineers on Slack, I figured out that the nodes can be deployed together, and the second and third node will just wait for the first one to come online and then join. That worked in principle, or when I deployed sequentially, but not when they are deployed all at once. Fortunately, dbdeployer has several ways of enabling debugging output, and after a few unsuccessful attempts I got the reason: PXC initialisation happens using rsync on port 4444. When the nodes are started sequentially, the receiving node takes control of port 4444 without conflicts, gets the job done and releases the port. When we deploy all nodes at once, there is a race for the possession of the synchronisation port, and a random node will win it, leaving the others waiting forever.

Thus, I modified the installation to allocate a different rsync port for each node, and after that the concurrent installation worked as well.

The last obstacle was the discovery that there is yet another port (IST, or Incremental State Transfer port), which is always one number bigger than the SST port. Thus, if the SST port is, say, 5555, the IST port is set to 5556. This means that, unlike other dbdeployer clusters, I can’t set port numbers incrementally, but I need to set them with an interval. I did that, and the cluster came with a default allocation of four ports per node (MySQL, rsync, SST, IST). If we also enable MySQLX, which comes includes as PXC binaries are based on MySQL 5.7, we would set 5 ports per node, and a majestic 15 ports for a three-node cluster.

Anyway, the support for Percona XtraDB Cluster is available in dbdeployer 1.21.0. Let’s see a sample session to use the new functionality.

$ $ dbdeployer --version
dbdeployer version 1.21.0

$ dbdeployer unpack --prefix=pxc ~/downloads/Percona-XtraDB-Cluster-5.7.25-rel28-31.35.1.Linux.x86_64.ssl100.tar.gz  
Renaming directory $HOME/opt/mysql/Percona-XtraDB-Cluster-5.7.25-rel28-31.35.1.Linux.x86_64.ssl100 to $HOME/opt/mysql/pxc5.7.25

Before trying the cluster, it would be a good practice to make sure that your system can deploy a single node.

$ dbdeployer deploy single pxc5.7.25  
Database installed in $HOME/sandboxes/msb_pxc5_7_25  
run 'dbdeployer usage single' for basic instructions'  
. sandbox server started

$ $HOME/sandboxes/msb_pxc5_7_25/test_sb  
# Tests :    11  
# PASS  :    11  
# fail  :     0

$ dbdeployer delete msb_pxc5_7_25  

And now for the real test:

$ dbdeployer deploy replication --topology=pxc pxc5.7.25  
Installing and starting node 1  
. sandbox server started  
Installing and starting node 2  
...... sandbox server started  
Installing and starting node 3  
..... sandbox server started  
Replication directory installed in $HOME/sandboxes/pxc_msb_pxc5_7_25  
run 'dbdeployer usage multiple' for basic instructions'

We should now see all the allocated ports.

$ dbdeployer sandboxes --header  
            name                       type             version                                      ports  
---------------------------- ------------------------ ----------- ----------------------------------------------------------------------------  
 pxc_msb_pxc5_7_25        :   Percona-Xtradb-Cluster   pxc5.7.25   [26226 26352 26353 26364 26227 26354 26355 26365 26228 26356 26357 26366 ]

If we want more detail, we can look at the sandbox description file:

$ cat $HOME/sandboxes/pxc_msb_pxc5_7_25/sbdescription.json  
    "basedir": "$HOME/opt/mysql/pxc5.7.25",  
    "type": "Percona-Xtradb-Cluster",  
    "version": "pxc5.7.25",  
    "flavor": "pxc",  
    "port": [  
    "nodes": 3,  
    "node_num": 0,  
    "dbdeployer-version": "1.21.0",  
    "timestamp": "Thu Mar  7 17:20:03 CET 2019",  
    "command-line": "dbdeployer deploy replication --topology=pxc pxc5.7.25"  

Now we can run the replication test. Given that we have a cluster where all nodes are masters, the test will create a table in each node, and read the result in each slave (again, each node):

$ $HOME/sandboxes/pxc_msb_pxc5_7_25/test_replication  
# master 1  
# master 2  
# master 3  
# slave 1  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 2  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 3  
ok - '3' == '3' - Slaves received tables from all masters  
# pass: 3  
# fail: 0

It’s a simple test, but it tells us that the cluster is fully functional.

Thanks to Alkin and other Percona engineers who have tested the prototype in real time.

1 comment:

Unknown said...

Great job Giuseppe, works like a charm! :)

Btw, if anyone has problems with initializing the first node without even the error log created in the data dir, try calling the unpacked mysqld binary if it doesn't throw openssl lib version related errors.