Sunday, March 11, 2018

Concurrent sandbox deployment


Version 0.3.0 of dbdeployer has gained the ability of deploying multiple sandboxes concurrently. Whenever we deploy a group of sandboxes (replication, multiple) we can use the --concurrent flag, telling dbdeployer that it should run operations concurrently.

What happens when a single sandbox gets deployed? There are six sets of operations:

  1. Create the sandbox directory and write down its scripts;
  2. Run the initialisation script;
  3. Start the database server;
  4. Run the pre-grants SQL commands (if any;)
  5. Load the grants;
  6. Run the post-grants SQL commands (if any;)

When several sandboxes are deployed concurrently, dbdeployer runs only the first step, and then creates a list of commands with an associated priority index. These commands are assembled for every sandbox, and then executed concurrently for every step.
The sequence of events for a deployment of three sandboxes in replication would be like this:

  1. Create the sandbox skeleton for every sandbox;
  2. Initialise all database servers;
  3. start all the servers;
  4. run the pre-grants, grants, post-grants scripts.
  5. Runs the group initialisation script (start master and slaves, or setup group replication).

Depending on the computer architecture, the server version, and the number of nodes, the speed of deployment can increase from 2 to 5 times.

Let's see an example:

$ time dbdeployer deploy replication 5.7.21
[...]
real    0m13.789s
user    0m1.143s
sys 0m1.873s

$ time dbdeployer deploy replication 5.7.21 --concurrent
[...]
real    0m7.780s
user    0m1.329s
sys 0m1.811s

There is a significant speed increase. The gain rises sharply if we use an higher number of nodes.

$ time dbdeployer deploy replication 5.7.21 --nodes=5
[...]
real    0m23.425s
user    0m1.923s
sys 0m3.106s

$ time dbdeployer deploy replication 5.7.21 \
    --nodes=5 --concurrent
[...]
real    0m7.686s
user    0m2.248s
sys 0m2.777s

As we can see, the time for deploying 5 nodes is roughly the same used for 3 nodes. While the sequential operations take time proportionally with the number of nodes, the concurrent task stays almost constant.

Things a re a bit different for group replication, as the group initialisation (which happens after all the servers are up and running) takes more time than the simple master/slave deployment, and can't be easily reduced using the current code.

A similar optimisation happens when we delete multiple sandboxes. Here the operation is at sandbox level (1 replication cluster = 1 sandbox) not at server level, and for that reason the gain is less sharp. Still, operations are noticeably faster.

There is room for improvement, but I have seen that the total testing time for dbdeployer test suite has dropped from 26 to 15 minutes. I think it was a week end well spent.

No comments: