Sunday, September 24, 2017

Revisiting roles in MySQL 8.0

In my previous article about roles I said that one of the problems with role usage is that roles need to be activated before they kick in. Let's recap briefly what the problem is:

## new session, as user `root`

mysql [localhost] {root} ((none)) > create role viewer;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > grant select on *.* to viewer;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > create user see_it_all identified by 'msandbox';
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > grant viewer to see_it_all;
Query OK, 0 rows affected (0.01 sec)

## NEW session, as user `see_it_all`

mysql [localhost] {see_it_all} ((none)) > use test
ERROR 1044 (42000): Access denied for user 'see_it_all'@'%' to database 'test'

mysql [localhost] {see_it_all} ((none)) > show grants\G
*************************** 1. row ***************************
Grants for see_it_all@%: GRANT USAGE ON *.* TO `see_it_all`@`%`
*************************** 2. row ***************************
Grants for see_it_all@%: GRANT `viewer`@`%` TO `see_it_all`@`%`
2 rows in set (0.00 sec)

mysql [localhost] {see_it_all} (test) > select current_role();
+----------------+
| current_role() |
+----------------+
| NONE           |
+----------------+
1 row in set (0.00 sec)

We can create a simple role that gives read-only access to most database objects, and assign it to a user. However, when the new user tries accessing one database, it is rejected. The problem is that the role must be activated, either permanently, or for the current session.

mysql [localhost] {see_it_all} ((none)) > set role viewer;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {see_it_all} (test) > select current_role();
+----------------+
| current_role() |
+----------------+
| `viewer`@`%`   |
+----------------+
1 row in set (0.00 sec)

mysql [localhost] {see_it_all} ((none)) > use test
Database changed

mysql [localhost] {see_it_all} (test) > show grants\G
*************************** 1. row ***************************
Grants for see_it_all@%: GRANT SELECT ON *.* TO `see_it_all`@`%`
*************************** 2. row ***************************
Grants for see_it_all@%: GRANT `viewer`@`%` TO `see_it_all`@`%`
2 rows in set (0.00 sec)

The main issue here is that the role is not active immediately. If we grant a given privilege to a user, the user will be able to operate under that privilege straight away. If we grant a role, instead, the user can't use it immediately. Roles need to be activated, either by the giver or by the receiver.


Auto activating roles


In MySQL 8.0.2 there are two new features related to roles, and one of them addresses the main problem we have just seen. When we use activate_all_roles_on_login, all roles become active when the user starts a session, regardless of any role activation that may pre-exist. Let's try. In the previous example, as root, we issue this command:


mysql [localhost] {root} ((none)) > set global activate_all_roles_on_login=ON;
Query OK, 0 rows affected (0.00 sec)

Then, we connect as user see_it_all

mysql [localhost] {see_it_all} ((none)) > select current_role();
+----------------+
| current_role() |
+----------------+
| `viewer`@`%`   |
+----------------+
1 row in set (0.00 sec)

mysql [localhost] {see_it_all} ((none)) > use test
Database changed

The role is active. The current role can be overridden temporarily using SET ROLE:

mysql [localhost] {see_it_all} ((none)) > use test
Database changed

mysql [localhost] {see_it_all} (test) > set role none;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {see_it_all} (test) > select current_role();
+----------------+
| current_role() |
+----------------+
| NONE           |
+----------------+
1 row in set (0.00 sec)

mysql [localhost] {see_it_all} (test) > show tables;
ERROR 1044 (42000): Access denied for user 'see_it_all'@'%' to database 'test'

This is a good option, which can further simplify DBAs work. There are, as usual, a few caveats:

  • This option has effect only on login, i.e. when the user starts a new session. Users that are already logged in when the option is changed will not be affected until they re-connect.
  • Use of this option can have adverse effects when using combinations of roles. If the DBA intent is to give users several roles that should be used separately, using activate_all_roles_on_login will make the paradigm more difficult to use. Let's see an example:

CREATE ROLE payroll_viewer ;
GRANT SELECT ON payroll.* TO payroll_viewer;

CREATE ROLE payroll_updater;
GRANT CREATE, INSERT, UPDATE, DELETE ON payroll.* TO payroll_updater;

CREATE ROLE personnel_viewer ;
GRANT SELECT ON personnel.* TO personnel_viewer;

CREATE ROLE personnel_updater;
GRANT CREATE, INSERT, UPDATE, DELETE ON personnel.* TO personnel_updater;

CREATE ROLE payroll;
GRANT payroll_updater, payroll_viewer, personnel_viewer to payroll;

CREATE ROLE personnel;
GRANT personnel_updater, personnel_viewer to personnel;

CREATE USER pers_user identified by 'msandbox';
CREATE USER pers_manager identified by 'msandbox';
CREATE USER pay_user identified by 'msandbox';

GRANT personnel to pers_user;
GRANT personnel, payroll_viewer to pers_manager;
GRANT payroll to pay_user;

SET DEFAULT ROLE personnel TO pers_user;
SET DEFAULT ROLE personnel TO pers_manager;
SET DEFAULT ROLE payroll TO pay_user;

In the above situation, we want the user pers_manager to see the personnel records by default, but she needs to manually activate payroll_viewer to see the payroll.
If we set activate_all_roles_on_login, pers_manager would be able to see payroll info without further action.


Mandatory roles


Another option introduced in 8.0.2 is mandatory_roles. This variable can be set with a list of roles. When set, the roles in the list will be added to the privileges of all users, including future ones.

Here's an example of how this feature could be useful. We want a schema containing data that should be accessible to all users, regardless of their privileges.

CREATE SCHEMA IF NOT EXISTS company;

DROP TABLE IF EXISTS company.news;
CREATE TABLE company.news(
    id int not null auto_increment primary key,
    news_type ENUM('INFO', 'WARNING', 'ALERT'),
    contents MEDIUMTEXT);

DROP ROLE IF EXISTS news_reader;
CREATE ROLE news_reader;
GRANT SELECT ON company.* TO news_reader;
SET PERSIST mandatory_roles = news_reader;

In this example, every user that starts a session after mandatory_roles was set will be able to access the "company" schema and read the news from there.

There are at least two side effects of this feature:

  • When a role is included in the list of mandatory roles, it can't be dropped.
mysql [localhost] {root} (mysql) > drop role news_reader;
ERROR 4527 (HY000): The role `news_reader`@`%` is a mandatory role and can't be revoked or dropped. 
The restriction can be lifted by excluding the role identifier from the global variable mandatory_roles.
  • users who have already a broad access that include the privileges in the mandatory role will nonetheless have the global role show up in the user list of grants. For example, here is how 'root'@'localhost' grants look like:
mysql [localhost] {root} ((none)) > show grants \G
*************************** 1. row ***************************
Grants for root@localhost: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD,
 SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, 
 CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, 
 CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, 
 CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `root`@`localhost` WITH GRANT 
 OPTION
*************************** 2. row ***************************
Grants for root@localhost: GRANT BACKUP_ADMIN,BINLOG_ADMIN,CONNECTION_ADMIN,
 ENCRYPTION_KEY_ADMIN,GROUP_REPLICATION_ADMIN,PERSIST_RO_VARIABLES_ADMIN, 
 REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,
 SET_USER_ID,SYSTEM_VARIABLES_ADMIN,XA_RECOVER_ADMIN ON *.* TO `root`@`localhost` 
 WITH GRANT OPTION
*************************** 3. row ***************************
Grants for root@localhost: GRANT <b>SELECT ON `company`.*</b> TO `root`@`localhost`
*************************** 4. row ***************************
Grants for root@localhost: GRANT PROXY ON ''@'' TO 'root'@'localhost' WITH GRANT OPTION
*************************** 5. row ***************************
Grants for root@localhost: GRANT <b>`news_reader`@`%`</b> TO `root`@`localhost`
5 rows in set (0.00 sec)

More gotchas

There are several commands for setting roles. One of them uses ALTER USER, while the rest uses a SET command.

  • First gotcha: SET ROLE and SET DEFAULT ROLE don't need an equals sign (=). The syntax is similar to SET CHARACTER SET, not to SET variable. This is a bit confusing, because another security related command (SET PASSWORD) requires the '=' in the assignment.

  • Now, for the really entertaining part, here's a list of commands that can give any DBA an headache.


Command meaning
SET ROLE role_name Activates the role role_name for the current session.
SET DEFAULT ROLE role_name Sets the role role_name as default permanently.
SET ROLE DEFAULT Activates the default role for the current session.

State of bugs


Community members have reported several bugs about roles. While I am happy of the MySQL team response concerning the usability of roles (the automatic activation came after I gave feedback) I am less thrilled by seeing that none of the public bugs reported on this matter have been addressed.Bug#85561 is particularly vexing. I reported that users can be assigned non-existing roles as default. I was answered with a sophism about the inner being of a default role, and the bug report was closed with a "Won't fix" state. I disagree with this characterisation. The behaviour that I reported is a bug because it allows users to write a wrong statement without a warning or an error. I hope the team will reconsider and take action to improve the usability of roles.

Sunday, May 28, 2017

How to break MySQL InnoDB cluster

A few weeks ago I started experimenting with MySQL InnoDB cluster. As part of the testing, I tried to kill a node to see what happens to the cluster.

The good news is that the cluster is resilient. When the primary node goes missing, the cluster replaces it immediately, and operations continue. This is one of the features of an High Availability system, but this feature alone does not define the usefulness or the robustness of the system. In one of my previous jobs, I worked at testing a commercial HA system and I've learned a few things about what makes a reliable system.

Armed with this knowledge, I did some more experiments with InnoDB Cluster. The attempt from my previous article had no other expectation than seeing operations continue with ease (primary node replacement.) In this article, I examine a few more features of an HA system:

  • Making sure that a failed primary node does not try to force itself back into the cluster;
  • Properly welcoming a failed node into the cluster;
  • Handling a Split Brain cluster.

To explore the above features (or lack of) we are going to simulate some mundane occurrences. We start with the same cluster seen in the previous article, using Docker InnoDB Cluster. The initial state is

{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr1:3306",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

The first experiment is to restart a non-primary node

$ docker restart mysqlgr2

and see what happens to the cluster

$ ./tests/check_cluster.sh | grep 'primary\|address\|status'
    "primary": "mysqlgr1:3306",
    "status": "OK_NO_TOLERANCE",
    "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
            "address": "mysqlgr1:3306",
            "status": "ONLINE"
            "address": "mysqlgr2:3306",
            "status": "(MISSING)"
            "address": "mysqlgr3:3306",
            "status": "ONLINE"

The cluster detects that one member is missing. But after a few seconds, it goes back to normality:

$ ./tests/check_cluster.sh | grep 'primary\|address\|status'
    "primary": "mysqlgr1:3306",
    "status": "OK",
    "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
            "address": "mysqlgr1:3306",
            "status": "ONLINE"
            "address": "mysqlgr2:3306",
            "status": "ONLINE"
            "address": "mysqlgr3:3306",
            "status": "ONLINE"

This looks good. Now, let's do the same to the primary node

$ docker restart mysqlgr1

$ ./tests/check_cluster.sh 2| grep 'primary\|address\|status'
    "primary": "mysqlgr2:3306",
    "status": "OK_NO_TOLERANCE",
    "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
            "address": "mysqlgr1:3306",
            "status": "(MISSING)"
            "address": "mysqlgr2:3306",
            "status": "ONLINE"
            "address": "mysqlgr3:3306",
            "status": "ONLINE"

As before, the cluster detects that a node is missing, and excludes it from the cluster. Since it was the primary node, another one becomes primary.

However, this time the node does not come back in the cluster. Checking the cluster status again after several minutes, node 1 is still reported missing. This is not a bug. This is a feature of well behaved HA systems: a primary node that has been already replaced should not come back to the cluster automatically.

Also this experiment was good. Now, for the interesting part, let's see the Split-Brain situation.

Np brain 987746 000000

At this moment, there are two parts of the cluster, and each one sees it in a different way. The view from the current primary node is the one reported above and what we would expect: node 1 is not available. But if we ask the cluster status to node 1, we get a different situation:

$ ./tests/check_cluster.sh 1 | grep 'primary\|address\|status'
    "primary": "mysqlgr1:3306",
    "status": "OK_NO_TOLERANCE",
    "statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
            "address": "mysqlgr1:3306",
            "status": "ONLINE"
            "address": "mysqlgr2:3306",
            "status": "(MISSING)"
            "address": "mysqlgr3:3306",
            "status": "(MISSING)"

Node 1 thinks it's the primary, and two nodes are missing. Node 2 and three think that node 1 is missing.

In a sane system, the logical way to operate is to admit the failed node back into the cluster, after checking that it is safe to do so. In the InnoDB cluster management there is a rejoinInstance method that allows us to get an instance back:

$ docker exec -it mysqlgr2 mysqlsh --uri root@mysqlgr2:3306 -p$(cat secretpassword.txt)

mysql-js> cluster = dba.getCluster()
<Cluster:testcluster>

mysql-js> cluster.rejoinInstance('mysqlgr1:3306')
Rejoining the instance to the InnoDB cluster. Depending on the original
problem that made the instance unavailable, the rejoin operation might not be
successful and further manual steps will be needed to fix the underlying
problem.

Please monitor the output of the rejoin operation and take necessary action if
the instance cannot rejoin.

Please provide the password for 'root@mysqlgr1:3306':
Rejoining instance to the cluster ...

The instance 'root@mysqlgr1:3306' was successfully rejoined on the cluster.

The instance 'mysqlgr1:3306' was successfully added to the MySQL Cluster.

Sounds good, eh? Apparently, we have node 1 back in the fold. Let's check:

$ ./tests/check_cluster.sh 2| grep 'primary\|address\|status'
    "primary": "mysqlgr2:3306",
    "status": "OK_NO_TOLERANCE",
    "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
            "address": "mysqlgr1:3306",
            "status": "(MISSING)"
            "address": "mysqlgr2:3306",
            "status": "ONLINE"
            "address": "mysqlgr3:3306",
            "status": "ONLINE"

Nope. Node 1 is still missing. And if we try to rescan the cluster, we see that the rejoin call was not effective:

mysql-js> cluster.rescan()
Rescanning the cluster...

Result of the rescanning operation:
{
    "defaultReplicaSet": {
        "name": "default",
        "newlyDiscoveredInstances": [],
        "unavailableInstances": [
            {
                "host": "mysqlgr1:3306",
                "label": "mysqlgr1:3306",
                "member_id": "6bd04911-4374-11e7-b780-0242ac170002"
            }
        ]
    }
}

The instance 'mysqlgr1:3306' is no longer part of the HA setup. It is either offline or left the HA group.
You can try to add it to the cluster again with the cluster.rejoinInstance('mysqlgr1:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y|n]: n

It's curious (and frustrating) that we get a recommendation to run the very same function that we've attempted a minute ago.

But, just as a devilish thought, let's try the same experiment from the invalid cluster.

$ docker exec -it mysqlgr1 mysqlsh --uri root@mysqlgr1:3306 -p$(cat secretpassword.txt)

mysql-js> cluster = dba.getCluster()
<Cluster:testcluster>


mysql-js> cluster.rejoinInstance('mysqlgr2:3306')
Rejoining the instance to the InnoDB cluster. Depending on the original
problem that made the instance unavailable, the rejoin operation might not be
successful and further manual steps will be needed to fix the underlying
problem.

Please monitor the output of the rejoin operation and take necessary action if
the instance cannot rejoin.

Please provide the password for 'root@mysqlgr2:3306':
Rejoining instance to the cluster ...

The instance 'root@mysqlgr2:3306' was successfully rejoined on the cluster.

The instance 'mysqlgr2:3306' was successfully added to the MySQL Cluster.
mysql-js> cluster.status()
{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr1:3306",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            }
        }
    }
}

Now this was definitely not supposed to happen. The former failed node has invited a healthy node into its minority cluster and the operation succeeded!

The horrible part? This illegal operation succeeded into reconciling the views from node 1 and node2. Now also node 2 thinks that node1 is again the primary node, and node 3 (which was minding its own business and never had any accidents) is considered missing:

$ ./tests/check_cluster.sh 2| grep 'primary\|address\|status'
    "primary": "mysqlgr1:3306",
    "status": "OK_NO_TOLERANCE",
    "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
            "address": "mysqlgr1:3306",
            "status": "ONLINE"
            "address": "mysqlgr2:3306",
            "status": "ONLINE"
            "address": "mysqlgr3:3306",
            "status": "(MISSING)"

And node 3 all of a sudden finds itself in the role of failed node, while it had had nothing to do about the previous operations:

$ ./tests/check_cluster.sh 3| grep 'primary\|address\|status'
    "primary": "mysqlgr3:3306",
    "status": "OK_NO_TOLERANCE",
    "statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
            "address": "mysqlgr1:3306",
            "status": "(MISSING)"
            "address": "mysqlgr2:3306",
            "status": "(MISSING)"
            "address": "mysqlgr3:3306",
            "status": "ONLINE"

In short, while we were attempting to fix a split brain, we ended up with a different split brain, and an unexpected node promotion. This is clearly a bug, and I hope the MySQL team can make the system more robust.

Monday, May 08, 2017

Getting to know MySQL InnoDB cluster, the new kid in the block

Innodb cluster3

InnoDB Cluster was released as GA a few weeks ago. I remember the initial announcement of the product at OOW 2016, promising a seamless solution for replication and high availability with great ease of use. I was a bit disappointed to see that, at GA release time, the InnoDB Cluster is a patchwork of three separate products (Group Replication, MySQL Router, MySQL Shell) which the users have to collect and install separately.

Given this situation, I was very pleased when Matthew Lord published Docker-InnoDB-Cluster, an image for Docker that contains everything you need to get the system up and running. The associated scripts make the experience even easier: not only we don't have to hunt for components, but the cluster deployment procedure is completely automated.

Installation

The process is painless. After cloning the repository the start script takes care of everything. It will create a network, deploy three database nodes, and fire up the router.

$ ./start_three_node_cluster.sh
Creating dedicated grnet network...
# network grnet already exists
NETWORK ID          NAME                DRIVER              SCOPE
8fa365076198        grnet               bridge              local
Bootstrapping the cluster...
12fb4bd975c2fb2e7152ed64e12d2d212bbc9f1d3b39d715ea0c73eeb37fed45
Container mysqlgr1 is up at Sun May  7 22:02:38 CEST 2017
Starting mysqlgr1 container...
Starting mysqlgr1 container...
MySQL init process done. Ready for start up.
Getting GROUP_NAME...
Adding second node...
a2b504ea1920d35b1555f65de24cd364fc1bc7a6ac87ca4eb32f4c02f5afce7c
Container mysqlgr2 is up at Sun May  7 22:02:48 CEST 2017
Starting mysqlgr2 container...
Starting mysqlgr2 container...
MySQL init process done. Ready for start up.
Adding third node...
393d46b9a1795531d99f68645087393a54b2463ef88b9b3c4cbe735c1527fe57
Container mysqlgr3 is up at Sun May  7 22:02:58 CEST 2017
Starting mysqlgr3 container...
Starting mysqlgr3 container...
MySQL init process done. Ready for start up.
Sleeping 10 seconds to give the cluster time to sync up
Adding a router...
830c3125bad70b09b057cee370ee490bcb88b1d4a1bfec347cda847942f3b56e
Container mysqlrouter1 is up at Sun May  7 22:03:17 CEST 2017
Done!
Connecting to the InnoDB cluster...

Most of the configuration (which has been simplified thanks to the usage of MySQL shell to add nodes) is handled inside the container initialization script. Just a few details are needed in the cluster deployment script to get the result.

The deployment script will also invoke the mysql shell in one of the nodes to show the status of the cluster:

Creating a Session to 'root@mysqlgr1:3306'
Classic Session successfully established. No default schema selected.
{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr1:3306",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

The above status is the result of dba.getCluster().status(), which is a convenient way of collecting a bunch of information about the cluster and then present them in a compact JSON structure. If you enable the general log prior to running this command, you would see something like this:

select count(*) from performance_schema.replication_group_members where MEMBER_ID = @@server_uuid AND MEMBER_STATE IS NOT NULL AND MEMBER_STATE != 'OFFLINE';
select count(*) from mysql_innodb_cluster_metadata.instances where mysql_server_uuid = @@server_uuid;
SELECT @@server_uuid, VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'group_replication_primary_member';
SELECT MEMBER_STATE FROM performance_schema.replication_group_members WHERE MEMBER_ID = '0030396b-3300-11e7-a8b6-0242ac170002';
SELECT CAST(SUM(IF(member_state = 'UNREACHABLE', 1, 0)) AS SIGNED) AS UNREACHABLE,  COUNT(*) AS TOTAL FROM performance_schema.replication_group_members;
select count(*) from performance_schema.replication_group_members where MEMBER_ID = @@server_uuid AND MEMBER_STATE IS NOT NULL AND MEMBER_STATE != 'OFFLINE';
select count(*) from mysql_innodb_cluster_metadata.instances where mysql_server_uuid = @@server_uuid;
SELECT @@server_uuid, VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'group_replication_primary_member';
SELECT MEMBER_STATE FROM performance_schema.replication_group_members WHERE MEMBER_ID = '0030396b-3300-11e7-a8b6-0242ac170002';
SELECT CAST(SUM(IF(member_state = 'UNREACHABLE', 1, 0)) AS SIGNED) AS UNREACHABLE,  COUNT(*) AS TOTAL FROM performance_schema.replication_group_members;
SELECT cluster_id, cluster_name, default_replicaset, description, options, attributes FROM mysql_innodb_cluster_metadata.clusters WHERE attributes->'$.default' = true;
show databases like 'mysql_innodb_cluster_metadata';
SELECT replicaset_name, topology_type FROM mysql_innodb_cluster_metadata.replicasets WHERE replicaset_id = 7;
select count(*) from performance_schema.replication_group_members where MEMBER_ID = @@server_uuid AND MEMBER_STATE IS NOT NULL AND MEMBER_STATE != 'OFFLINE';
select count(*) from mysql_innodb_cluster_metadata.instances where mysql_server_uuid = @@server_uuid;
SELECT @@server_uuid, VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'group_replication_primary_member';
SELECT MEMBER_STATE FROM performance_schema.replication_group_members WHERE MEMBER_ID = '0030396b-3300-11e7-a8b6-0242ac170002';
SELECT CAST(SUM(IF(member_state = 'UNREACHABLE', 1, 0)) AS SIGNED) AS UNREACHABLE,  COUNT(*) AS TOTAL FROM performance_schema.replication_group_members;
SELECT @@group_replication_single_primary_mode;
SHOW STATUS LIKE 'group_replication_primary_member';
SELECT mysql_server_uuid, instance_name, role, MEMBER_STATE, JSON_UNQUOTE(JSON_EXTRACT(addresses, "$.mysqlClassic")) as host FROM mysql_innodb_cluster_metadata.instances LEFT JOIN performance_schema.replication_group_members ON `mysql_server_uuid`=`MEMBER_ID` WHERE replicaset_id = 7;

In short, these commands check that the cluster is resilient, summarized in the statusText field, which says that we can lose up to one node and the cluster will keep working.

High Availability

What we have after deployment is a system that is highly available:

  • Group replication with one primary node;
  • Access to the cluster through the router, which provides one port for Read/write and one for Read-Only;
  • Automatic failover. When the primary node fails, another one is promoted on the spot, without any manual labor.

Let's start a test. We can check whether the data inserted from the R/W port is then retrieved by other nodes using the R/O port.

$ docker exec -it mysqlrouter1 /opt/ic/tests/test_router.sh
Server ID of current master
--------------
SELECT @@global.server_id
--------------

+--------------------+
| @@global.server_id |
+--------------------+
|                100 |
+--------------------+
Create content using router
--------------
create schema if not exists test
--------------

--------------
create table t1(id int not null primary key, name varchar(50))
--------------

--------------
insert into t1 values (1, "aaa")
--------------

The first part of the test will show the server ID of the primary node, by using the router R/W port (6446.) Then it will create a table and insert one record.

Server ID of a RO node
--------------
SELECT @@global.server_id
--------------

+--------------------+
| @@global.server_id |
+--------------------+
|                200 |
+--------------------+
retrieving contents using router
--------------
SELECT * from test.t1
--------------

+----+------+
| id | name |
+----+------+
|  1 | aaa  |
+----+------+

Using the read-only port (6447), we get a different node, and we retrieve the data created in the primary node.

Now we can test the high availability. Since we are using Docker, instead of simply kill the MySQL service, we can simulate an anvil falling on the server, by wiping away the container:

$ docker rm -f -v mysqlgr1
mysqlgr1

The primary node is gone for good. Let's see what the cluster status says now:

$ ./tests/check_cluster.sh 2
Creating a Session to 'root@mysqlgr2:3306'
Classic Session successfully established. No default schema selected.
{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr2:3306",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

There are a few differences compared to the initial report:

  • The primary is now node 2 (mysqlgr2);
  • Node 1 is marked as MISSING;
  • The cluster has lost its resilience. Unless we add another node, no further failures will be handled automatically.

We can run the router test again, and it will work just as well, with the differences reported below:

Server ID of current master
--------------
SELECT @@global.server_id
--------------

+--------------------+
| @@global.server_id |
+--------------------+
|                200 |
+--------------------+
Create content using router
--------------
create schema if not exists test
--------------

--------------
drop table if exists t1
--------------

--------------
create table t1(id int not null primary key, name varchar(50))
--------------

--------------
insert into t1 values (1, "aaa")
--------------

Server ID of a RO node
--------------
SELECT @@global.server_id
--------------

+--------------------+
| @@global.server_id |
+--------------------+
|                300 |
+--------------------+

We see that the primary has now ID 200, and the R/O node is 300 (the only other node that has survived.)

Summarizing

  • The good

    • I can see that some of the ease of use promised in San Francisco is already available. We can create a cluster with little effort.
    • The recovery from the master failure is transparent.
    • The cluster status gives clear information about the system.
  • The bad

    • MySQL shell is difficult to use. The command line help is insufficient: some options require trial and error to work correctly. It also does not use an options file like other MySQL clients.
    • Adding a node after the primary has become unavailable is harder than it should be, and the manual does not contemplate this case. It only mentions a server that can be restarted.
    • Restarting the router after the primary died is impossible with the current configuration.
    • The metadata for replication is now in three different schemas: mysql, performance_schema, and mysql_innodb_cluster_metadata. I understand the reasons, but I believe that a simplification would be possible.
  • The bottom line: quite good to start a cluster, but not enough to deal effectively with simple HA cases. Possibly released too early.

Sunday, April 30, 2017

Revisiting the hidden MySQL 8.0 data dictionary tables

A few months ago I wrote about showing the hidden tables in MySQL 8 data dictionary. What I presented there was a naïve solution, which I am now able to improve with a few (bold) moves. In the solution given in my previous article, I was able to guess the table names somehow, but they were still hidden from the catalog. I did not think clearly then. I should have used the data dictionary itself to see the tables. Here, in addition to getting the real list of tables, I offer a feature to unhide them permanently.

MySQL-Sandbox 3.2.08 has now the ability of un-hide the data dictionary tables, and keep them available for inspection. This feature came to my mind after a chat with the MySQL team during PerconaLive 2017. They stressed the reason for hiding the tables, which is they want the freedom to change the tables if needed, without being constrained by what is published. They also say that if there is something missing in the information_schema views we could ask for an enhancement. I thought immediately that asking for I_S views enhancements would be easier if we could see the original tables. In the interest of science, then, I added an option --expose_dd_tables to MySQL-Sandbox, which will start the server with the changes needed to see and use the data dictionary tables.


$ make_sandbox 8.0.1 -- --expose_dd_tables

[...]

$ ~/sandboxes/msb_8_0_1/use mysql
mysql [localhost] {msandbox} (mysql) > select tables.name from mysql.tables inner join sys.dd_hidden_tables using (id);
+------------------------------+
| name                         |
+------------------------------+
| version                      |
| collations                   |
| tablespaces                  |
| tablespace_files             |
| catalogs                     |
| schemata                     |
| st_spatial_reference_systems |
| tables                       |
| view_table_usage             |
| view_routine_usage           |
| columns                      |
| indexes                      |
| index_column_usage           |
| column_type_elements         |
| foreign_keys                 |
| foreign_key_column_usage     |
| table_partitions             |
| table_partition_values       |
| index_partitions             |
| table_stats                  |
| index_stats                  |
| events                       |
| routines                     |
| parameters                   |
| parameter_type_elements      |
| triggers                     |
| character_sets               |
+------------------------------+
27 rows in set (0.00 sec)

This is, without a doubt, the complete list of hidden tables.

As you can infer from the query above, MySQL-Sandbox adds a table to the sys schema with the list of hidden tables.
If you want to hide the tables again, you can run this:

mysql [localhost] {msandbox} (mysql) > update mysql.tables set hidden=1 
    where id in (select id from sys.dd_hidden_tables);
Query OK, 27 rows affected (0.04 sec)
Rows matched: 27  Changed: 27  Warnings: 0

But of course we need the tables visible, so now we can peruse the data dictionary tables at will, and find out if there is valuable information missing from information_schema views.

How does this work?

I used the same hack defined in my previous post, combined with a new feature of MySQL 8.0 (SET PERSIST) that allows me to keep the special option enabled across restarts. Unlike the solution in my previous post, though, users only need to use the starting option (--expose_dd_tables) without remembering obscure debug sequences.

Happy hacking!

Saturday, March 25, 2017

MySQL 8.0 roles

One of the most interesting features introduced in MySQL 8.0 is roles or the ability of defining a set of privileges as a named role and then granting that set to one or more users. The main benefits are more clarity of privileges and ease of administration. Using roles we can assign the same set of privileges to several users, and eventually modify or revoke all privileges at once.

Roles nutshell 2

Roles in a nutshell

Looking at the manual, we see that using roles is a matter of several steps.

(1) Create a role. The statement is similar to CREATE USER though the effects are slightly different (we will see it in more detail later on.)

mysql [localhost] {root} ((none)) > CREATE ROLE r_lotr_dev;
Query OK, 0 rows affected (0.02 sec)

(2) Grant privileges to the role. Again, this looks like granting privileges to a user.

mysql [localhost] {root} ((none)) > GRANT ALL on lotr.* TO r_lotr_dev;
Query OK, 0 rows affected (0.01 sec)

(3) Create a user. This is the same that we've been doing until version 5.7.

mysql [localhost] {root} (mysql) > create user aragorn identified by 'lotrpwd';
Query OK, 0 rows affected (0.01 sec)

Notice that the role is in the mysql.user table, and looks a lot like a user.

mysql [localhost] {root} ((none)) > select host, user, authentication_string from mysql.user where user not like '%sandbox%;
+-----------+-------------+-------------------------------------------+
| host      | user        | authentication_string                     |
+-----------+-------------+-------------------------------------------+
| %         | r_lotr_dev  |                                           | 
| %         | aragorn     | *3A376D0203958F6EB9E6166DC048EC04F84C00B9 |
| localhost | mysql.sys   | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE |
| localhost | root        | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
+-----------+-------------+-------------------------------------------+

(4) Grant the role to the user. Instead of granting single privileges, we grant the role. Note that when we use this syntax we can't specify the ON xxx clause, because it is already implicit in the role definition.

mysql [localhost] {root} (mysql) > grant r_lotr_dev to aragorn;
Query OK, 0 rows affected (0.03 sec)

The relationship between user and role is recorded in a new table in the mysql database:

mysql [localhost] {root} (mysql) > select * from mysql.role_edges;
+-----------+------------+---------+---------+-------------------+
| FROM_HOST | FROM_USER  | TO_HOST | TO_USER | WITH_ADMIN_OPTION |
+-----------+------------+---------+---------+-------------------+
| %         | r_lotr_dev | %       | aragorn | N                 |
+-----------+------------+---------+---------+-------------------+

(5) Finally we set the default role. Until this point, the role is assigned to the user, but not active. We either set the default role permanently (as done below) or we let the user activate the role.

mysql [localhost] {root} (mysql) > alter user aragorn default role r_lotr_dba;
Query OK, 0 rows affected (0.01 sec)

If a default role has been set, it is recorded in another new table, mysql.default_roles.

mysql [localhost] {root} (mysql) > select * from mysql.default_roles;
+------+---------+-------------------+-------------------+
| HOST | USER    | DEFAULT_ROLE_HOST | DEFAULT_ROLE_USER |
+------+---------+-------------------+-------------------+
| %    | aragorn | %                 | r_lotr_dba        |
+------+---------+-------------------+-------------------+

Common gotchas

If we follow all the steps described above, using roles would not feel any different than using old style grants. But it is easy to go astray if we skip something. Let's see an example. First, we create an user and grant the same role as the one given to user aragorn:

mysql [localhost] {root} (mysql) > create user legolas identified by 'lotrpwd';
Query OK, 0 rows affected (0.03 sec)

mysql [localhost] {root} (mysql) > grant r_lotr_dev to legolas;
Query OK, 0 rows affected (0.01 sec)

Then we connect using user legolas:

mysql [localhost] {legolas} ((none)) > use lotr;
ERROR 1044 (42000): Access denied for user 'legolas'@'%' to database 'lotr'
mysql [localhost] {legolas} ((none)) > show grants;
+-----------------------------------------+
| Grants for legolas@%                    |
+-----------------------------------------+
| GRANT USAGE ON *.* TO `legolas`@`%`     |
| GRANT `r_lotr_dev`@`%` TO `legolas`@`%` |
+-----------------------------------------+
2 rows in set (0.00 sec)

mysql [localhost] {legolas} ((none)) > show grants for legolas using r_lotr_dev;
+---------------------------------------------------+
| Grants for legolas@%                              |
+---------------------------------------------------+
| GRANT USAGE ON *.* TO `legolas`@`%`               |
| GRANT ALL PRIVILEGES ON `lotr`.* TO `legolas`@`%` |
| GRANT `r_lotr_dev`@`%` TO `legolas`@`%`           |
+---------------------------------------------------+
3 rows in set (0.00 sec)

The role is assigned to the user, but it is not active, as no default role was defined. To use a role, the user must activate one:

mysql [localhost] {legolas} ((none)) > select current_role();
+----------------+
| current_role() |
+----------------+
| NONE           |
+----------------+
1 row in set (0.01 sec)

There is no default role for legolas. We need to assign one.

mysql [localhost] {legolas} ((none)) > set role r_lotr_dev;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {legolas} ((none)) > select current_role();
+------------------+
| current_role()   |
+------------------+
| `r_lotr_dev`@`%` |
+------------------+
1 row in set (0.00 sec)

Now the role is active, and all its privileges kick in:

mysql [localhost] {legolas} ((none)) > use lotr
Database changed
mysql [localhost] {legolas} (lotr) > show tables;
Empty set (0.00 sec)

mysql [localhost] {legolas} (lotr) > create table t1 (i int not null primary key);
Query OK, 0 rows affected (0.15 sec)

Note that the role activation is volatile. If the user reconnects, the role activation goes away:

mysql [localhost] {legolas} ((none)) > connect;
Connection id:    33
Current database: *** NONE ***

mysql [localhost] {legolas} ((none)) > select current_role();
+----------------+
| current_role() |
+----------------+
| NONE           |
+----------------+
1 row in set (0.01 sec)

For a permanent assignment, the user can use the SET DEFAULT ROLE statement:

mysql [localhost] {legolas} ((none)) > set default role r_lotr_dev to legolas;
Query OK, 0 rows affected (0.01 sec)

The above statement corresponds to ALTER USER ... DEFAULT ROLE .... Every user can set its own role with this statement, without needing additional privileges.

mysql [localhost] {legolas} ((none)) > alter user legolas default role r_lotr_dev;
Query OK, 0 rows affected (0.01 sec)

Now if the user reconnects, the role persists:

mysql [localhost] {legolas} ((none)) > connect
Connection id:    34
Current database: *** NONE ***

mysql [localhost] {legolas} ((none)) > select current_role();
+------------------+
| current_role()   |
+------------------+
| `r_lotr_dev`@`%` |
+------------------+
1 row in set (0.01 sec)

However, if an user sets its own default role using ALTER USER, the change will be available only in the next session, or after calling SET ROLE.

Let's try:

mysql [localhost] {legolas} ((none)) > set default role none to legolas;
Query OK, 0 rows affected (0.02 sec)

mysql [localhost] {legolas} ((none)) > select current_role();
+------------------+
| current_role()   |
+------------------+
| `r_lotr_dev`@`%` |
+------------------+
1 row in set (0.00 sec)

The role stays what it was before. This is similar to what happens when using SET GLOBAL var_name vs SET SESSION var_name. In the first case the effect persists, but it is not activated immediately, while a session set will be immediately usable, but will not persist after a new connection.

mysql [localhost] {legolas} ((none)) > connect
Connection id:    35
Current database: *** NONE ***

select current_role();
+----------------+
| current_role() |
+----------------+
| NONE           |
+----------------+
1 row in set (0.00 sec)

It's worth mentioning that SET DEFAULT ROLE implies an hidden update of a mysql table (default_roles), similar to what happens with SET PASSWORD. In both cases, a user without explicit access to the mysql database will be unable to check the effects of the operation.

Advanced role management.

Dealing with one or two roles is no big deal. Using the statements seen above, we can easily see what privileges were granted to a role or an user. When we have many roles and many users, the overview become more difficult to achieve.

Before we see the complex scenario, let's have a deeper look at what constitutes a role.

mysql [localhost] {root} (mysql) > create role role1;
Query OK, 0 rows affected (0.05 sec)

mysql [localhost] {root} (mysql) > create user user1 identified by 'somepass';
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} (mysql) > select host, user, authentication_string , password_expired , account_locked from user where user in ('role1', 'user1');
+------+-------+-------------------------------------------+------------------+----------------+
| host | user  | authentication_string                     | password_expired | account_locked |
+------+-------+-------------------------------------------+------------------+----------------+
| %    | role1 |                                           | Y                | Y              |
| %    | user1 | *13883BDDBE566ECECC0501CDE9B293303116521A | N                | N              |
+------+-------+-------------------------------------------+------------------+----------------+
2 rows in set (0.00 sec)

The main difference between user and role is that a role is created with password_expired and account_locked. Apart from that, an user could be used as a role and vice versa.

mysql [localhost] {root} (mysql) > alter user role1 identified by 'msandbox';
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > alter user role1 account unlock;
Query OK, 0 rows affected (0.02 sec)

Now role1 can access the database as any other user.

mysql [localhost] {root} ((none)) > grant root@'localhost' to user1;
Query OK, 0 rows affected (0.03 sec)

And user1 inherits all privileges from root, but it can access the server from any host.

Now let's see some complex usage of roles. In a typical organisation, we would define several roles to use the lotr database:

CREATE ROLE r_lotr_observer;
CREATE ROLE r_lotr_tester;
CREATE ROLE r_lotr_dev;
CREATE ROLE r_lotr_dba;

GRANT SELECT on lotr.* TO r_lotr_observer;
GRANT SELECT, INSERT, UPDATE, DELETE on lotr.* TO r_lotr_tester;
GRANT ALL on lotr.* TO r_lotr_dev;
GRANT ALL on *.* TO r_lotr_dba;

And then assign those roles to several users:

CREATE USER bilbo     IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER frodo     IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER sam       IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER pippin    IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER merry     IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER boromir   IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER gimli     IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER aragorn   IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER legolas   IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER gollum    IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER galadriel IDENTIFIED BY 'msandbox' PASSWORD EXPIRE;
CREATE USER gandalf   IDENTIFIED BY 'msandbox';

GRANT r_lotr_observer TO pippin, merry, boromir, gollum;
SET DEFAULT ROLE r_lotr_observer to pippin, merry, boromir, gollum;

GRANT r_lotr_tester TO sam, bilbo, gimli;
SET DEFAULT ROLE r_lotr_tester to sam, bilbo, gimli;

GRANT r_lotr_dev to frodo, aragorn, legolas;
SET DEFAULT ROLE r_lotr_dev to frodo, aragorn, legolas;

GRANT r_lotr_dba TO gandalf, galadriel;
SET DEFAULT ROLE r_lotr_dba to gandalf, galadriel;

Now we have 12 users with 4 different roles. Looking at the user table, we don't get a good overview:

mysql [localhost] {root} (mysql) > select host, user, authentication_string from mysql.user where user not like '%sandbox%';
+-----------+-----------------+-------------------------------------------+
| host      | user            | authentication_string                     |
+-----------+-----------------+-------------------------------------------+
| %         | aragorn         | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | bilbo           | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | boromir         | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | frodo           | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | galadriel       | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | gandalf         | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | gimli           | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | gollum          | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | legolas         | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | merry           | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | pippin          | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| %         | r_lotr_dba      |                                           |
| %         | r_lotr_dev      |                                           |
| %         | r_lotr_observer |                                           |
| %         | r_lotr_tester   |                                           |
| %         | sam             | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | mysql.sys       | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE |
| localhost | root            | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
+-----------+-----------------+-------------------------------------------+

And even the roles_edges table does show a clear picture:

mysql [localhost] {root} (mysql) > select * from role_edges;
+-----------+-----------------+---------+-----------+-------------------+
| FROM_HOST | FROM_USER       | TO_HOST | TO_USER   | WITH_ADMIN_OPTION |
+-----------+-----------------+---------+-----------+-------------------+
| %         | r_lotr_dba      | %       | galadriel | N                 |
| %         | r_lotr_dba      | %       | gandalf   | N                 |
| %         | r_lotr_dev      | %       | aragorn   | N                 |
| %         | r_lotr_dev      | %       | frodo     | N                 |
| %         | r_lotr_dev      | %       | legolas   | N                 |
| %         | r_lotr_observer | %       | boromir   | N                 |
| %         | r_lotr_observer | %       | gollum    | N                 |
| %         | r_lotr_observer | %       | merry     | N                 |
| %         | r_lotr_observer | %       | pippin    | N                 |
| %         | r_lotr_tester   | %       | bilbo     | N                 |
| %         | r_lotr_tester   | %       | gimli     | N                 |
| %         | r_lotr_tester   | %       | sam       | N                 |
+-----------+-----------------+---------+-----------+-------------------+

Here's a better use of that table's data. Which users are using the dev role?

select to_user as users from role_edges where from_user = 'r_lotr_dev';
+---------+
| users   |
+---------+
| aragorn |
| frodo   |
| legolas |
+---------+
3 rows in set (0.00 sec)

And the testers?

select to_user as users from role_edges where from_user = 'r_lotr_tester';
+-------+
| users |
+-------+
| bilbo |
| gimli |
| sam   |
+-------+
3 rows in set (0.00 sec)

Or, even better, we could see all the roles at once:

select from_user as role, count(*) as how_many_users, group_concat(to_user) as users from role_edges group by role;
+-----------------+----------------+-----------------------------+
| role            | how_many_users | users                       |
+-----------------+----------------+-----------------------------+
| r_lotr_dba      |              2 | galadriel,gandalf           |
| r_lotr_dev      |              3 | aragorn,frodo,legolas       |
| r_lotr_observer |              4 | boromir,gollum,merry,pippin |
| r_lotr_tester   |              3 | bilbo,gimli,sam             |
+-----------------+----------------+-----------------------------+
4 rows in set (0.01 sec)

Similarly, we could list the default role for several users:

select default_role_user as default_role, group_concat(user) as users from default_roles group by default_role;
+-----------------+-----------------------------+
| default_role    | users                       |
+-----------------+-----------------------------+
| r_lotr_dba      | galadriel,gandalf           |
| r_lotr_dev      | aragorn,frodo,legolas       |
| r_lotr_observer | boromir,gollum,merry,pippin |
| r_lotr_tester   | bilbo,gimli,sam             |
+-----------------+-----------------------------+

The latest two queries should be good candidates for information_schema views.

Another candidate for an information_schema view is the list of roles, for which there is no satisfactory workaround now. The best we could think of is a list of users with password_expired and account_locked:

select host, user from mysql.user where password_expired='y' and account_locked='y';
+------+-----------------+
| host | user            |
+------+-----------------+
| %    | r_lotr_dba      |
| %    | r_lotr_dev      |
| %    | r_lotr_observer |
| %    | r_lotr_tester   |
+------+-----------------+
4 rows in set (0.00 sec)

Until a feature to differentiate users and roles is developed, it is advisable to use a name format that helps identify roles without help from the server. In this article I am using r_ as a prefix, which makes role listing easier.

mysql [localhost] {root} (mysql) > select host, user from mysql.user where user like 'r_%';
+------+-----------------+
| host | user            |
+------+-----------------+
| %    | r_lotr_dba      |
| %    | r_lotr_dev      |
| %    | r_lotr_observer |
| %    | r_lotr_tester   |
+------+-----------------+
4 rows in set (0.00 sec)

Known bugs

  • bug#85562 Dropping a role does not remove privileges from active users.
  • bug#85561 Users can be assigned non-existing roles as default.
  • bug#85559 Dropping a role does not remove the associated default roles.
  • bug#84244 Distinguish roles and plain users in mysql.user.
  • bug#82987 SHOW CREATE USER doesn't show default role.

Summing up

Roles are a great addition to the MySQL feature set. The usability could be improved with some views in information_schema (usersforrole, defaultroleforuser) and functions in sys schema (defaultroleforuser, is_role.)

Perhaps some commands to activate or deactivate roles could make administration easier. The current bugs don't affect the basic functionality but decrease usability. I think that another run of bug fixing and user feedback would significantly improve this feature.

More about roles in my talk at PerconaLive 2017.


Sunday, January 22, 2017

MySQL Group Replication vs. Multi Source

In my previous post, we saw the usage of MySQL Group Replication (MGR) in single-primary mode. We know that Oracle does not recommends using MGR in multi-primary mode, but there is so much in the documentation and in presentations about MGR behavior in multi-primary, that I feel I should really give it a try, and especially compare this technology with the already existing multiple master solution introduced in 5.7: multi-source replication.

Installation

To this extent, I will set up two clusters using MySQL-Sandbox. The instructions for MGR in the manual use three nodes in the same host without using MySQL Sandbox. Here we can see that using MySQL-Sandbox simplifies operations considerably (the scripts are available in GitHub):

Group replication

# ----------------------------------------------------------------------------
#!/bin/bash
# mm_gr.sh : installs MySQL Group Replication
MYSQL_VERSION=$1
[ -z "$MYSQL_VERSION" ] && MYSQL_VERSION=5.7.17

make_multiple_sandbox --gtid --group_directory=GR $MYSQL_VERSION

if [ "$?" != "0" ] ; then exit 1 ; fi
multi_sb=$HOME/sandboxes/GR
baseport=$($multi_sb/n1 -BN -e 'select @@port')
baseport=$(($baseport+99))

port1=$(($baseport+1))
port2=$(($baseport+2))
port3=$(($baseport+3))
for N in 1 2 3
do
    myport=$(($baseport+N))
    options=(
        binlog_checksum=NONE
        log_slave_updates=ON
        plugin-load=group_replication.so
        group_replication=FORCE_PLUS_PERMANENT
        group_replication_start_on_boot=OFF
        group_replication_bootstrap_group=OFF
        transaction_write_set_extraction=XXHASH64
        report-host=127.0.0.1
        loose-group_replication_group_name="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
        loose-group_replication_local_address="127.0.0.1:$myport"
        loose-group_replication_group_seeds="127.0.0.1:$port1,127.0.0.1:$port2,127.0.0.1:$port3"
        loose-group-replication-single-primary-mode=off
    )
    $multi_sb/node$N/add_option ${options[*]}

    user_cmd='reset master;'
    user_cmd="$user_cmd CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery';"
    $multi_sb/node$N/use -v -u root -e "$user_cmd"
done

START_CMD="SET GLOBAL group_replication_bootstrap_group=ON;"
START_CMD="$START_CMD START GROUP_REPLICATION;"
START_CMD="$START_CMD SET GLOBAL group_replication_bootstrap_group=OFF;"
$multi_sb/n1 -v -e "$START_CMD"
sleep 1
$multi_sb/n2 -v -e 'START GROUP_REPLICATION;'
sleep 1
$multi_sb/n3 -v -e 'START GROUP_REPLICATION;'
sleep 1
$multi_sb/use_all 'select * from performance_schema.replication_group_members'
# ----------------------------------------------------------------------------

Using this script, we get a cluster with MGR up and running. Here's a trimmed-out sample of its output:

$ ./mm_gr.sh
installing node 1
installing node 2
installing node 3
group directory installed in $HOME/sandboxes/GR
# option 'binlog_checksum=NONE' added to configuration file
# option 'log_slave_updates=ON' added to configuration file
# option 'plugin-load=group_replication.so' added to configuration file
# option 'group_replication=FORCE_PLUS_PERMANENT' added to configuration file
# option 'group_replication_start_on_boot=OFF' added to configuration file
# option 'group_replication_bootstrap_group=OFF' added to configuration file
# option 'transaction_write_set_extraction=XXHASH64' added to configuration file
# option 'loose-group_replication_group_name=aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee' added to configuration file
# option 'loose-group_replication_local_address=127.0.0.1:14518' added to configuration file
# option 'loose-group_replication_group_seeds=127.0.0.1:14518,127.0.0.1:14519,127.0.0.1:14520' added to configuration file
# option 'loose-group-replication-single-primary-mode=off' added to configuration file
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

# [ ...]
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

# [...]
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

SET GLOBAL group_replication_bootstrap_group=ON
START GROUP_REPLICATION
SET GLOBAL group_replication_bootstrap_group=OFF
--------------

--------------
START GROUP_REPLICATION
--------------
START GROUP_REPLICATION
--------------

Multi-source replication

We have a similar (but much shorter) script to run multi-source replication in sandboxes.

# ----------------------------------------------------------------------------
#!/bin/bash
# mm_ms.sh : installs MySQL multi-source replication
MYSQL_VERSION=$1
[ -z "$MYSQL_VERSION" ] && MYSQL_VERSION=5.7.16

make_multiple_sandbox --gtid --group_directory=MS $MYSQL_VERSION

if [ "$?" != "0" ] ; then exit 1 ; fi
multi_sb=$HOME/sandboxes/MS

$multi_sb/use_all 'reset master'

for N in 1 2 3
do
    user_cmd=''
    for node in 1 2 3
    do
        if [ "$node" != "$N" ]
        then
            master_port=$($multi_sb/n$node -BN -e 'select @@port')
            user_cmd="$user_cmd CHANGE MASTER TO MASTER_USER='rsandbox', "
            user_cmd="$user_cmd MASTER_PASSWORD='rsandbox', master_host='127.0.0.1', "
            user_cmd="$user_cmd master_port=$master_port FOR CHANNEL 'node$node';"
            user_cmd="$user_cmd START SLAVE FOR CHANNEL 'node$node';"
        fi
    done
    $multi_sb/node$N/use -v -u root -e "$user_cmd"
done
# ----------------------------------------------------------------------------

Sample run:

$ ./mm_ms.sh
installing node 1
installing node 2
installing node 3
group directory installed in $HOME/sandboxes/MS
# server: 1:
# server: 2:
# server: 3:
--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14318 FOR CHANNEL 'node2'
START SLAVE FOR CHANNEL 'node2'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14319 FOR CHANNEL 'node3'
START SLAVE FOR CHANNEL 'node3'

--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14317 FOR CHANNEL 'node1'
START SLAVE FOR CHANNEL 'node1'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14319 FOR CHANNEL 'node3'
START SLAVE FOR CHANNEL 'node3'

--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14317 FOR CHANNEL 'node1'
START SLAVE FOR CHANNEL 'node1'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14318 FOR CHANNEL 'node2'
START SLAVE FOR CHANNEL 'node2'
--------------

Simple test data

Finally, we have a script that will create one table for each node and insert one record.


# ----------------------------------------------------------------------------     
#!/bin/bash
multi_sb=$1
if [ -z "$multi_sb" ]
then
    echo multiple sandbox path needed
    exit 1
fi
if [ ! -d $multi_sb ]
then
    echo directory $multi_sb not found
    exit 1
fi
if [ ! -d "$multi_sb/node3" ]
then
    echo directory $multi_sb/node3 not found
    exit 1
fi
cd $multi_sb

for N in  1 2 3 ; do
    ./n$N -e "create schema if not exists test"
    ./n$N -e "drop table if exists test.t$N"
    ./n$N -e "create table test.t$N(id int not null primary key, sid int)"
    ./n$N -e "insert into  test.t$N values ($N, @@server_id)"
done

./use_all 'select * from test.t1 union select * from test.t2 union select * from test.t3'
# ----------------------------------------------------------------------------

We run the script in both clusters, and at the end we'll have the test database with three tables, each one created and filled by a different node.

Checking replication status

The old topology: multi-source

Let's start with the the old technology, so we can easily compare it with the new one.

node1 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00014318-2222-2222-2222-222222222222   # ----
                THREAD_ID: 32
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 244
 LAST_HEARTBEAT_TIMESTAMP: 2017-01-22 13:31:54
 RECEIVED_TRANSACTION_SET: 00014318-2222-2222-2222-222222222222:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00014319-3333-3333-3333-333333333333   # ----
                THREAD_ID: 34
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 244
 LAST_HEARTBEAT_TIMESTAMP: 2017-01-22 13:31:55
 RECEIVED_TRANSACTION_SET: 00014319-3333-3333-3333-333333333333:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
2 rows in set (0.00 sec)

Notice that we are benefitting from a feature of MySQL-Sandbox that creates a more readable version of the server UUID. This way we can easily identify the nodes. Here we see that each transaction set has a clearly defined origin. We can see similar information in the replication tables from the mysql database:

node1 [localhost] {msandbox} (mysql) > select * from slave_master_info\G
*************************** 1. row ***************************
       Number_of_lines: 25
       Master_log_name: mysql-bin.000001
        Master_log_pos: 154
                  Host: 127.0.0.1       # ----
             User_name: rsandbox
         User_password: rsandbox
                  Port: 14318           # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid: 00014318-2222-2222-2222-222222222222  # ----
           Retry_count: 86400
           Ssl_crlpath:
 Enabled_auto_position: 0
          Channel_name: node2
           Tls_version:
*************************** 2. row ***************************
       Number_of_lines: 25
       Master_log_name: mysql-bin.000001
        Master_log_pos: 154
                  Host: 127.0.0.1    # ----
             User_name: rsandbox
         User_password: rsandbox
                  Port: 14319        # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid: 00014319-3333-3333-3333-333333333333  # ----
           Retry_count: 86400
           Ssl_crlpath:
 Enabled_auto_position: 0
          Channel_name: node3
           Tls_version:
2 rows in set (0.00 sec)

Additionally, we have SHOW SLAVE STATUS, which, although not the ideal monitoring tool, is still the only place where we can see at once both the received and executed transactions, and the corresponding binary log and relay log records.

Here's an abridged version:

node1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 14318
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 965
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 1178
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 965
              Relay_Log_Space: 1387
             Master_Server_Id: 102
                  Master_UUID: 00014318-2222-2222-2222-222222222222
             Master_Info_File: mysql.slave_master_info
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
           Retrieved_Gtid_Set: 00014318-2222-2222-2222-222222222222:1-4
            Executed_Gtid_Set: 00014317-1111-1111-1111-111111111111:1-4,
00014318-2222-2222-2222-222222222222:1-4,
00014319-3333-3333-3333-333333333333:1-4
                 Channel_Name: node2
*************************** 2. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 14319
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 965
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1178
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 965
              Relay_Log_Space: 1387
              Until_Condition: None
             Master_Server_Id: 103
                  Master_UUID: 00014319-3333-3333-3333-333333333333
             Master_Info_File: mysql.slave_master_info
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
           Retrieved_Gtid_Set: 00014319-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00014317-1111-1111-1111-111111111111:1-4,
00014318-2222-2222-2222-222222222222:1-4,
00014319-3333-3333-3333-333333333333:1-4
                 Channel_Name: node3
2 rows in set (0.00 sec)

Finally, we'll have a look at the data itself:

node1 [localhost] {msandbox} (mysql) > show binlog events;
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| Log_name         | Pos | Event_type     | Server_id | End_log_pos | Info                                                              |
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| mysql-bin.000001 |   4 | Format_desc    |       101 |         123 | Server ver: 5.7.16-log, Binlog ver: 4                             |
| mysql-bin.000001 | 123 | Previous_gtids |       101 |         154 |                                                                   |
| mysql-bin.000001 | 154 | Gtid           |       101 |         219 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:1' |
| mysql-bin.000001 | 219 | Query          |       101 |         325 | create schema if not exists test                                  |
| mysql-bin.000001 | 325 | Gtid           |       101 |         390 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:2' |
| mysql-bin.000001 | 390 | Query          |       101 |         518 | DROP TABLE IF EXISTS `test`.`t1` /* generated by server */        |
| mysql-bin.000001 | 518 | Gtid           |       101 |         583 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:3' |
| mysql-bin.000001 | 583 | Query          |       101 |         711 | create table test.t1(id int not null primary key, sid int)        |
| mysql-bin.000001 | 711 | Gtid           |       101 |         776 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:4' |
| mysql-bin.000001 | 776 | Query          |       101 |         844 | BEGIN                                                             |
| mysql-bin.000001 | 844 | Table_map      |       101 |         890 | table_id: 108 (test.t1)                                           |
| mysql-bin.000001 | 890 | Write_rows     |       101 |         934 | table_id: 108 flags: STMT_END_F                                   |
| mysql-bin.000001 | 934 | Xid            |       101 |         965 | COMMIT /* xid=72 */                                               |
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
13 rows in set (0.00 sec)

The binary log contains only the data produced in this node.

The new topology: MGR

Turning to the new software, let's first check whether replication is working. An important note here: SHOW SLAVE STATUS is not available in MGR. That's not entirely true. The channel architecture used for multi-master has been hijacked to convey information about group problems. If something goes wrong during the setup, you will find the information in the groupreplicationrecovery channel.

node1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS for channel 'group_replication_recovery';
Empty set (0.00 sec)

When things are fine, the tables in performance_schema report a satisfactory status:

node1 [localhost] {msandbox} (performance_schema) > select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 00014418-1111-1111-1111-111111111111 | gmini       |       14418 | ONLINE       |
| group_replication_applier | 00014419-2222-2222-2222-222222222222 | gmini       |       14419 | ONLINE       |
| group_replication_applier | 00014420-3333-3333-3333-333333333333 | gmini       |       14420 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

The above command tells us that all nodes are online.

Next, we ask what are the stats of the current member.

node1 [localhost] {msandbox} (performance_schema) > select * from replication_group_member_stats\G
*************************** 1. row ***************************
                      CHANNEL_NAME: group_replication_applier
                           VIEW_ID: 14850806532423012:3
                         MEMBER_ID: 00014418-1111-1111-1111-111111111111
       COUNT_TRANSACTIONS_IN_QUEUE: 0
        COUNT_TRANSACTIONS_CHECKED: 12
          COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
    LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006
1 row in set (0.00 sec)

The same operation from a different member will give a very similar result.

node2 [localhost] {msandbox} (performance_schema) > select * from replication_group_member_stats\G
*************************** 1. row ***************************
                      CHANNEL_NAME: group_replication_applier
                           VIEW_ID: 14850806532423012:3
                         MEMBER_ID: 00014419-2222-2222-2222-222222222222
       COUNT_TRANSACTIONS_IN_QUEUE: 0
        COUNT_TRANSACTIONS_CHECKED: 12
          COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
    LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006
1 row in set (0.00 sec)

Then, we check the more classical replication status:

node1 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: group_replication_applier
               GROUP_NAME: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
              SOURCE_UUID: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee    # ----
                THREAD_ID: NULL
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 0
 LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
 RECEIVED_TRANSACTION_SET: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
1 row in set (0.00 sec)

There are a few things that strike the observer immediately:

  • As we saw in the single-primary topology, all transactions bear the UUID of the group, not of the server that generated them. While in single-primary mode this could be considered an asset, as it simplifies a failover procedure, in multi-primary mode I consider it to be a loss. We lose the knowledge of the transaction provenience. As you can see, the SOURCE_UUID field shows the group ID instead of the node.
  • The GTID numbers look odd. There is a set that stars at 1, another set that starts at 1 million, and a third one that starts at 2 million. What's going on? The answer is in the value of group_replication_gtid_assignment_block_size, which determines the block of values for each node. When the values in the block are exhausted, the node allocates another block. Someone could naively think that we could use this block to identify which node the data comes from, but this would be ultimately wrong for two reasons:
    • The blocks are assigned on a first-come-first-served basis. If we start operations in node 2, its transactions will bear the lowest numbers.
    • When the blocks are exhausted, the node starts a new block, meaning that with a busy cluster we will have hard time identifying which nodes uses which block.

If someone thought that we could get some more information from the replication tables in mysql, they are in for a disappointment:

node2 [localhost] {msandbox} (mysql) > select * from slave_master_info\G
*************************** 1. row ***************************
       Number_of_lines: 25
       Master_log_name:
        Master_log_pos: 4
                  Host: <NULL>            # ----
             User_name:
         User_password:
                  Port: 0                 # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid:                   # ----
           Retry_count: 86400
 Enabled_auto_position: 1
          Channel_name: group_replication_applier
           Tls_version:
*************************** 2. row ***************************
       Number_of_lines: 25
       Master_log_name:
        Master_log_pos: 4
                  Host: <NULL>
             User_name: rsandbox
         User_password: rsandbox
                  Port: 0
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid:
           Retry_count: 1
 Enabled_auto_position: 1
          Channel_name: group_replication_recovery
           Tls_version:
2 rows in set (0.00 sec)

The table shows group operations rather than individual hosts connections. There is no origin information here.

Looking at the events, we will notice immediately some more differences.

node2 [localhost] {msandbox} (mysql) > show binlog events;
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
| Log_name         | Pos  | Event_type     | Server_id | End_log_pos | Info                                                                    |
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
| mysql-bin.000001 |    4 | Format_desc    |       102 |         123 | Server ver: 5.7.17-log, Binlog ver: 4                                   |
| mysql-bin.000001 |  123 | Previous_gtids |       102 |         150 |                                                                         |
| mysql-bin.000001 |  150 | Gtid           |       101 |         211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1'       |
| mysql-bin.000001 |  211 | Query          |       101 |         270 | BEGIN                                                                   |
| mysql-bin.000001 |  270 | View_change    |       101 |         369 | view_id=14850806532423012:1                                             |
| mysql-bin.000001 |  369 | Query          |       101 |         434 | COMMIT                                                                  |
| mysql-bin.000001 |  434 | Gtid           |       101 |         495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2'       |
| mysql-bin.000001 |  495 | Query          |       101 |         554 | BEGIN                                                                   |
| mysql-bin.000001 |  554 | View_change    |       101 |         693 | view_id=14850806532423012:2                                             |
| mysql-bin.000001 |  693 | Query          |       101 |         758 | COMMIT                                                                  |
| mysql-bin.000001 |  758 | Gtid           |       102 |         819 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:3'       |
| mysql-bin.000001 |  819 | Query          |       102 |         878 | BEGIN                                                                   |
| mysql-bin.000001 |  878 | View_change    |       102 |        1017 | view_id=14850806532423012:3                                             |
| mysql-bin.000001 | 1017 | Query          |       102 |        1082 | COMMIT                                                                  |
| mysql-bin.000001 | 1082 | Gtid           |       101 |        1143 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:4'       |
| mysql-bin.000001 | 1143 | Query          |       101 |        1250 | create schema if not exists test                                        |
| mysql-bin.000001 | 1250 | Gtid           |       101 |        1311 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:5'       |
| mysql-bin.000001 | 1311 | Query          |       101 |        1440 | DROP TABLE IF EXISTS `test`.`t1` /* generated by server */              |
| mysql-bin.000001 | 1440 | Gtid           |       101 |        1501 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:6'       |
| mysql-bin.000001 | 1501 | Query          |       101 |        1630 | create table test.t1(id int not null primary key, sid int)              |
| mysql-bin.000001 | 1630 | Gtid           |       101 |        1691 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:7'       |
| mysql-bin.000001 | 1691 | Query          |       101 |        1755 | BEGIN                                                                   |
| mysql-bin.000001 | 1755 | Table_map      |       101 |        1797 | table_id: 219 (test.t1)                                                 |
| mysql-bin.000001 | 1797 | Write_rows     |       101 |        1837 | table_id: 219 flags: STMT_END_F                                         |
| mysql-bin.000001 | 1837 | Xid            |       101 |        1864 | COMMIT /* xid=51 */                                                     |
| mysql-bin.000001 | 1864 | Gtid           |       102 |        1925 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000003' |
| mysql-bin.000001 | 1925 | Query          |       102 |        2032 | create schema if not exists test                                        |
| mysql-bin.000001 | 2032 | Gtid           |       102 |        2093 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000004' |
| mysql-bin.000001 | 2093 | Query          |       102 |        2222 | DROP TABLE IF EXISTS `test`.`t2` /* generated by server */              |
| mysql-bin.000001 | 2222 | Gtid           |       102 |        2283 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000005' |
| mysql-bin.000001 | 2283 | Query          |       102 |        2412 | create table test.t2(id int not null primary key, sid int)              |
| mysql-bin.000001 | 2412 | Gtid           |       102 |        2473 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000006' |
| mysql-bin.000001 | 2473 | Query          |       102 |        2542 | BEGIN                                                                   |
| mysql-bin.000001 | 2542 | Table_map      |       102 |        2584 | table_id: 220 (test.t2)                                                 |
| mysql-bin.000001 | 2584 | Write_rows     |       102 |        2624 | table_id: 220 flags: STMT_END_F                                         |
| mysql-bin.000001 | 2624 | Xid            |       102 |        2651 | COMMIT /* xid=62 */                                                     |
| mysql-bin.000001 | 2651 | Gtid           |       103 |        2712 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000003' |
| mysql-bin.000001 | 2712 | Query          |       103 |        2819 | create schema if not exists test                                        |
| mysql-bin.000001 | 2819 | Gtid           |       103 |        2880 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000004' |
| mysql-bin.000001 | 2880 | Query          |       103 |        3009 | DROP TABLE IF EXISTS `test`.`t3` /* generated by server */              |
| mysql-bin.000001 | 3009 | Gtid           |       103 |        3070 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000005' |
| mysql-bin.000001 | 3070 | Query          |       103 |        3199 | create table test.t3(id int not null primary key, sid int)              |
| mysql-bin.000001 | 3199 | Gtid           |       103 |        3260 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006' |
| mysql-bin.000001 | 3260 | Query          |       103 |        3324 | BEGIN                                                                   |
| mysql-bin.000001 | 3324 | Table_map      |       103 |        3366 | table_id: 221 (test.t3)                                                 |
| mysql-bin.000001 | 3366 | Write_rows     |       103 |        3406 | table_id: 221 flags: STMT_END_F                                         |
| mysql-bin.000001 | 3406 | Xid            |       103 |        3433 | COMMIT /* xid=68 */                                                     |
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
47 rows in set (0.00 sec)

Two important points:

  • All transaction IDs are assigned to the group, not to the node. The only way to see where the data is coming from is to look at the binary log itself and check the good old server-id. One wonders why we have come all this way with the ugly UUIDs in the global transaction identifier only to maim their usefulness by removing one of the most important feature, which is tracking the data origin.

For example:

# at 434
#170122 11:24:11 server id 101  end_log_pos 495         GTID    last_committed=1        sequence_number=2
SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2'/*!*/;
# at 495
#170122 11:24:11 server id 101  end_log_pos 554         Query   thread_id=7     exec_time=6     error_code=0
SET TIMESTAMP=1485080651/*!*/;
BEGIN
/*!*/;
  • Because log-slave-updates is mandatory, the binary log in every node will have all the transactions of every other node. This can have disagreeable side effects when dealing with large data. Here is an example when we load the sample employee database from node #1:

With Group Replication, the load takes 2 minutes and 16 seconds, and the binary logs have the same size in every node.

[GR]$ ls -lh node?/data/*bin*
-rw-r-----  1 gmax  staff   8.2K Jan 22 10:22 node1/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node1/data/mysql-bin.000002
-rw-r-----  1 gmax  staff    38B Jan 22 10:22 node1/data/mysql-bin.index

-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node2/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:12 node2/data/mysql-bin.index

-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node3/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:12 node3/data/mysql-bin.index

The same operation in multi-source replication takes 1 minute and 30 seconds. The binary logs are kept only in the origin.

[MS]$ ls -lh node?/data/*bin*
-rw-r-----  1 gmax  staff   4.9K Jan 22 10:26 node1/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    63M Jan 22 10:27 node1/data/mysql-bin.000002
-rw-r-----  1 gmax  staff    38B Jan 22 10:26 node1/data/mysql-bin.index

-rw-r-----  1 gmax  staff   1.4K Jan 22 10:14 node2/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:14 node2/data/mysql-bin.index

-rw-r-----  1 gmax  staff   1.4K Jan 22 10:14 node3/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:14 node3/data/mysql-bin.index

Conflict resolution

One of the strong points of MGR is conflict resolution.

We can try a conflicting operations in two nodes, inserting the same data at the same time:

use test;
set autocommit=0;
insert into t2 values (3, @@server_id);
commit;

In multi source, we get a replication error, on both nodes. It's an ugly result, but it tells the user immediately that something went wrong in a given node, and doesn't let the error propagate to other nodes.

In MGR, the situation varies. This is a possible outcome:

node1 [localhost] {msandbox} (test) > set autocommit=0;                        |   node2 [localhost] {msandbox} (test) > set autocommit=0;
Query OK, 0 rows affected (0.00 sec)                                           |   Query OK, 0 rows affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > insert into t2 values (3, @@server_id);  |   node2 [localhost] {msandbox} (test) > insert into t2 values (3, @@server_id);
Query OK, 1 row affected (0.00 sec)                                            |   Query OK, 1 row affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  102 |
+----+------+                                                                  |   +----+------+
2 rows in set (0.00 sec)                                                       |   2 rows in set (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > commit;                                  |   node2 [localhost] {msandbox} (test) > commit;
Query OK, 0 rows affected (0.01 sec)                                           |   ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.
                                                                               |   node2 [localhost] {msandbox} (test) > select * from t2;
 node1 [localhost] {msandbox} (test) > select * from t2;                       |   +----+------+
 +----+------+                                                                 |   | id | sid  |
 | id | sid  |                                                                 |   +----+------+
 +----+------+                                                                 |   |  2 |  102 |
 |  2 |  102 |                                                                 |   |  3 |  101 |
 |  3 |  101 |                                                                 |   +----+------+
 +----+------+                                                                 |   2 rows in set (0.00 sec)
 2 rows in set (0.00 sec)                                                      |

Here node # 2 got the transaction a fraction of second later, and its transaction was rolled back. Thus the transaction that was ultimately kept in the database was the one from node1 (server-id 101.) However, this behavior is not predictable. If we try the same operation again, we get a different outcome:

node1 [localhost] {msandbox} (test) > insert into t2 values (4, @@server_id);  |   node2 [localhost] {msandbox} (test) > insert into t2 values (4, @@server_id);
Query OK, 1 row affected (0.00 sec)                                            |   Query OK, 1 row affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  101 |
|  4 |  101 |                                                                  |   |  4 |  102 |
+----+------+                                                                  |   +----+------+
3 rows in set (0.00 sec)                                                       |   3 rows in set (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > commit;                                  |   node2 [localhost] {msandbox} (test) > commit;
Query OK, 0 rows affected (0.01 sec)                                           |
ERROR 3101 (HY000): Plugin instructed the server to rollback                   |
the current transaction.                                                       |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  101 |
|  4 |  102 |                                                                  |   |  4 |  102 |
+----+------+                                                                  |   +----+------+
4 rows in set (0.00 sec)                                                       |   3 rows in set (0.00 sec)

In the second attempt, the transaction was rolled back by node 1, and the surviving one is the one that was inserted from node 2. This means that conflict resolution works, but it may not be what the user wants, as the resolved conflict if aleatory.

Summing up

On the plus side, MGR keeps what it promises. We can set up a cluster of peer nodes and replicate data between nodes with some advantages compared to older multi-source topologies.

On the minus side, the documentation could be vastly improved, especially for multi-primary setup. Moreover, users need to be aware of the limitations, such as serializable isolation level and foreign keys with constraints not being supported.

Most important from my standpoint is the reduction of monitoring information for this technology, namely the loss of information about the data origin.