A few weeks ago I started experimenting with MySQL InnoDB cluster. As part of the testing, I tried to kill a node to see what happens to the cluster.
The good news is that the cluster is resilient. When the primary node goes missing, the cluster replaces it immediately, and operations continue. This is one of the features of an High Availability system, but this feature alone does not define the usefulness or the robustness of the system. In one of my previous jobs, I worked at testing a commercial HA system and I've learned a few things about what makes a reliable system.
Armed with this knowledge, I did some more experiments with InnoDB Cluster. The attempt from my previous article had no other expectation than seeing operations continue with ease (primary node replacement.) In this article, I examine a few more features of an HA system:
- Making sure that a failed primary node does not try to force itself back into the cluster;
- Properly welcoming a failed node into the cluster;
- Handling a Split Brain cluster.
To explore the above features (or lack of) we are going to simulate some mundane occurrences. We start with the same cluster seen in the previous article, using Docker InnoDB Cluster. The initial state is
{
"clusterName": "testcluster",
"defaultReplicaSet": {
"name": "default",
"primary": "mysqlgr1:3306",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"mysqlgr1:3306": {
"address": "mysqlgr1:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"mysqlgr2:3306": {
"address": "mysqlgr2:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"mysqlgr3:3306": {
"address": "mysqlgr3:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
}
}
The first experiment is to restart a non-primary node
$ docker restart mysqlgr2
and see what happens to the cluster
$ ./tests/check_cluster.sh | grep 'primary\|address\|status'
"primary": "mysqlgr1:3306",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"address": "mysqlgr1:3306",
"status": "ONLINE"
"address": "mysqlgr2:3306",
"status": "(MISSING)"
"address": "mysqlgr3:3306",
"status": "ONLINE"
The cluster detects that one member is missing. But after a few seconds, it goes back to normality:
$ ./tests/check_cluster.sh | grep 'primary\|address\|status'
"primary": "mysqlgr1:3306",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"address": "mysqlgr1:3306",
"status": "ONLINE"
"address": "mysqlgr2:3306",
"status": "ONLINE"
"address": "mysqlgr3:3306",
"status": "ONLINE"
This looks good. Now, let's do the same to the primary node
$ docker restart mysqlgr1
$ ./tests/check_cluster.sh 2| grep 'primary\|address\|status'
"primary": "mysqlgr2:3306",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"address": "mysqlgr1:3306",
"status": "(MISSING)"
"address": "mysqlgr2:3306",
"status": "ONLINE"
"address": "mysqlgr3:3306",
"status": "ONLINE"
As before, the cluster detects that a node is missing, and excludes it from the cluster. Since it was the primary node, another one becomes primary.
However, this time the node does not come back in the cluster. Checking the cluster status again after several minutes, node 1 is still reported missing. This is not a bug. This is a feature of well behaved HA systems: a primary node that has been already replaced should not come back to the cluster automatically.
Also this experiment was good. Now, for the interesting part, let's see the Split-Brain situation.
At this moment, there are two parts of the cluster, and each one sees it in a different way. The view from the current primary node is the one reported above and what we would expect: node 1 is not available. But if we ask the cluster status to node 1, we get a different situation:
$ ./tests/check_cluster.sh 1 | grep 'primary\|address\|status'
"primary": "mysqlgr1:3306",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
"address": "mysqlgr1:3306",
"status": "ONLINE"
"address": "mysqlgr2:3306",
"status": "(MISSING)"
"address": "mysqlgr3:3306",
"status": "(MISSING)"
Node 1 thinks it's the primary, and two nodes are missing. Node 2 and three think that node 1 is missing.
In a sane system, the logical way to operate is to admit the failed node back into the cluster, after checking that it is safe to do so. In the InnoDB cluster management there is a rejoinInstance method that allows us to get an instance back:
$ docker exec -it mysqlgr2 mysqlsh --uri root@mysqlgr2:3306 -p$(cat secretpassword.txt)
mysql-js> cluster = dba.getCluster()
<Cluster:testcluster>
mysql-js> cluster.rejoinInstance('mysqlgr1:3306')
Rejoining the instance to the InnoDB cluster. Depending on the original
problem that made the instance unavailable, the rejoin operation might not be
successful and further manual steps will be needed to fix the underlying
problem.
Please monitor the output of the rejoin operation and take necessary action if
the instance cannot rejoin.
Please provide the password for 'root@mysqlgr1:3306':
Rejoining instance to the cluster ...
The instance 'root@mysqlgr1:3306' was successfully rejoined on the cluster.
The instance 'mysqlgr1:3306' was successfully added to the MySQL Cluster.
Sounds good, eh? Apparently, we have node 1 back in the fold. Let's check:
$ ./tests/check_cluster.sh 2| grep 'primary\|address\|status'
"primary": "mysqlgr2:3306",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"address": "mysqlgr1:3306",
"status": "(MISSING)"
"address": "mysqlgr2:3306",
"status": "ONLINE"
"address": "mysqlgr3:3306",
"status": "ONLINE"
Nope. Node 1 is still missing. And if we try to rescan the cluster, we see that the rejoin call was not effective:
mysql-js> cluster.rescan()
Rescanning the cluster...
Result of the rescanning operation:
{
"defaultReplicaSet": {
"name": "default",
"newlyDiscoveredInstances": [],
"unavailableInstances": [
{
"host": "mysqlgr1:3306",
"label": "mysqlgr1:3306",
"member_id": "6bd04911-4374-11e7-b780-0242ac170002"
}
]
}
}
The instance 'mysqlgr1:3306' is no longer part of the HA setup. It is either offline or left the HA group.
You can try to add it to the cluster again with the cluster.rejoinInstance('mysqlgr1:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y|n]: n
It's curious (and frustrating) that we get a recommendation to run the very same function that we've attempted a minute ago.
But, just as a devilish thought, let's try the same experiment from the invalid cluster.
$ docker exec -it mysqlgr1 mysqlsh --uri root@mysqlgr1:3306 -p$(cat secretpassword.txt)
mysql-js> cluster = dba.getCluster()
<Cluster:testcluster>
mysql-js> cluster.rejoinInstance('mysqlgr2:3306')
Rejoining the instance to the InnoDB cluster. Depending on the original
problem that made the instance unavailable, the rejoin operation might not be
successful and further manual steps will be needed to fix the underlying
problem.
Please monitor the output of the rejoin operation and take necessary action if
the instance cannot rejoin.
Please provide the password for 'root@mysqlgr2:3306':
Rejoining instance to the cluster ...
The instance 'root@mysqlgr2:3306' was successfully rejoined on the cluster.
The instance 'mysqlgr2:3306' was successfully added to the MySQL Cluster.
mysql-js> cluster.status()
{
"clusterName": "testcluster",
"defaultReplicaSet": {
"name": "default",
"primary": "mysqlgr1:3306",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"topology": {
"mysqlgr1:3306": {
"address": "mysqlgr1:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"mysqlgr2:3306": {
"address": "mysqlgr2:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"mysqlgr3:3306": {
"address": "mysqlgr3:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
}
}
}
}
Now this was definitely not supposed to happen. The former failed node has invited a healthy node into its minority cluster and the operation succeeded!
The horrible part? This illegal operation succeeded into reconciling the views from node 1 and node2. Now also node 2 thinks that node1 is again the primary node, and node 3 (which was minding its own business and never had any accidents) is considered missing:
$ ./tests/check_cluster.sh 2| grep 'primary\|address\|status'
"primary": "mysqlgr1:3306",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"address": "mysqlgr1:3306",
"status": "ONLINE"
"address": "mysqlgr2:3306",
"status": "ONLINE"
"address": "mysqlgr3:3306",
"status": "(MISSING)"
And node 3 all of a sudden finds itself in the role of failed node, while it had had nothing to do about the previous operations:
$ ./tests/check_cluster.sh 3| grep 'primary\|address\|status'
"primary": "mysqlgr3:3306",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
"address": "mysqlgr1:3306",
"status": "(MISSING)"
"address": "mysqlgr2:3306",
"status": "(MISSING)"
"address": "mysqlgr3:3306",
"status": "ONLINE"
In short, while we were attempting to fix a split brain, we ended up with a different split brain, and an unexpected node promotion. This is clearly a bug, and I hope the MySQL team can make the system more robust.