The Data Charmer: August 2015

Monday, August 31, 2015

MySQL replication in action - Part 5 - parallel appliers

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

MySQL replication in action - Part 2 - Fan-in topology

MySQL replication in action - Part 3 - All-masters P2P topology

MySQL replication in action - Part 4 - star and hybrid topologies

Parallel replication overview

One of the main grievance of replication users is that, while a well tuned master server can handle thousands of concurrent operations, an equally tuned slave is constrained to work on a single thread. In Figure 1, we see the schematics of this paradigm. Multiple operations on the master are executed simultaneously and saved to the binary log. The slave IO thread copies the binary log events to a local log, and on such log the SQL thread executes the events on the slave database. When the master is very active, chances are that the slave lags behind, causing hatred and nightmares to the DBAs.

Figure 1 - Single applier

New MySQL Sandbox 3.1 - GitHub, and usability

I have three pieces of information to share about MySQL::Sandbox:

Version 3.1.0 has migrated from Launchpad to GitHub
This version is released under the Apache license. Both these changes are meant to improve and promote cooperation on the project.
There is an important change related to usability. When using replication with MySQL::Sandbox and MySQL 5.6+, the server UUIDs become more readable (see below).

First, some words on the location changes. About two years ago, I started plans for a rewrite of MySQL::Sandbox. Then, I had some unexpected changes, which involved moving home to a different continent twice within twelve months. The project was neglected, but I haven't dismissed it. While I wait for the rewrite to start, I wanted to get rid of the obstacles for rapid development, and I decided to transfer the current codebase to GitHub. This will allow me to use only one RCS instead of three (My team has abandoned svn too ¹).
Apart from the changes described in this post, there is little difference in the code published on GitHub.

Tungsten Replicator moved to GitHub with Apache license

It had been in the making for long time. Google announced that Google Code would be closing, and since then the Continuent team has been hard at work to handle the transition. You can guess it: this operation would have been quicker if it had been done by a small company like we were one year ago, but being part of a large corporation introduces some constraints that have affected our schedule.

However, our wish has always been, and still is, to keep Tungsten Replicator as an open source product, with full functionalities and with the full benefits that the open source development model offers.

Today, Tungsten Replicator is available on GitHub as vmware/tungsten-replicator, and it is wearing new clothes. It is not GPL anymore. In an effort to facilitate contributions, its license was changed to Apache 2.0.

Feature-wise, there is little difference from the previous release of 4.0. Mainly, we have cleaned up the code and moved out the pieces that no longer fit:

Bristlecone was removed from the package. It is used only for testing, and it will be released separately. There is no need to duplicate it into every Tungsten tarball.
The cookbook recipes have been retired. These scripts were created when the installer was still in its infancy and we had little documentation. Therefore, it was convenient to have wrappers for the common installation operations. Using the manual, it is pretty easy to install master/slave, fan-in, and multi-master topologies. The biggest reason for removing the cookbook, though, is that it was only useful for MySQL replication. If you need heterogenous deployments, the cookbook was an obstacle, rather than being helpful.
Some files were shuffled within the deployment tree. The ./tungsten-replicator/scripts directory was merged with ./tungsten-replicator/bin, the applier templates were moved from samples to a dedicated path, and we also did some other similar cleanup.

Although it has changed location and license, this is not a "release." If you compile the code, it will come up as 4.1, but it is still work in progress. Same as what was happening in the previous repository, we tag the code with the next version, and start working on it until it is deemed ready for release. The latest release for production (4.0.1) is still available from the old directory.

The code is available on GitHub, which makes collaboration much simpler than the previous repository. Take advantage of it: fork it, and help make the best replication tool even better!

MySQL usability issues for replication

In my latest series of advanced replication features, I came across several usability issues, which I would like to recap here. For each section of this list of requests, I make a wish list, with some general comments.

INSTALLATION

As the maintainer of MySQL Sandbox, a tool that wants to facilitate the installation of MySQL for testing, I am particularly sensitive to the annoyances during installation. I have covered the matter extensively in recent weeks and many times before. My point is that the logging of the installation must be useful for the DBA, not to the MySQL developers.

Make a big distinction between [Info] and [Warning] messages. Warnings should only appear when there is some risk. Using a warning to tell the user that the InnoDB log file was created is a nonsense.
… and then make [Info] messages suppressible. Currently, I can suppress warnings, but then I would suppress also the real warnings, in addition to the fake ones.
Make mysqld –initialize use the correct syntax for timestamps, instead of issuing a warning about the obsolete syntax that was used. (See In search of cleanliness : the elusive quiet installation for a full explanation)

MULTI-SOURCE

Here we have two main problems. The first one is that the introduction of multi source channels has left some new ways of shooting yourself in the foot. By mixing the new explicitly named channels with the hidden one (but still used in regular replication!) which is named "" (= the empty string), we may start to see a new series of blunders in replication.
The second set of problems comes from weak integration between existing features and the new one. There are operation modes, such as semi-synchronous, delayed, and parallel replication that were designed with a single data stream in mind, and that lack the capability of being tuned for different channels.

Make it impossible to use the unnamed channel and the named ones at the same time. Currently only a very weak protection exists against mixing named and unnamed channels. Also the syntax for channel ’’ should result in an error when named channels are defined.
Integrate features that were designed for single sources to work better with multi source: semi-sync replication, parallel applier, delayed replication.

MySQL replication in action - Part 4 - star and hybrid topologies

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

MySQL replication in action - Part 2 - Fan-in topology

MySQL replication in action - Part 3 - All-masters P2P topology

Introducing star topology.

In all-masters P2P topologies, we have seen that we have a way of deploying a topology where all nodes are masters, and achieve better efficiency and stability than ring topologies. That method comes at the price of a complex setup, which requires, for a N-node cluster, N*(N-1) connections.
We can achieve the same result as in a P2P all-masters topology by trading connections for stability. In a star topology (Figure 1) all nodes are masters, but they do not connect to each other directly. There is a special node, named hub, which receives the changes produced by each endpoint and spreads them to the others.
Topologies star

Figure 1 - A star topology

MySQL replication in action - Part 3: all-masters P2P topology

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

MySQL replication in action - Part 2 - Fan-in topology

In the previous article, we saw the basics of establishing replication from multiple origins to the same destination. By extending that concept, we can deploy more complex topologies, such as the point-to-point (P2P) all-masters topology, a robust and fast way of moving data.

Introduction to P2P all-masters topology

A P2P (Point-to-point) topology is a kind of deployment where replication happens in a single step from the producer to the consumers. For example, in a master/slave topology, replication from the master (producer) reaches every slave (consumer) in one step. This is simple P2P replication. If we use a hierarchical deployment, where every slave that is connected to the master is also replicating to one or more slaves, we will have a 2-step replication (Figure 1). Similarly, in circular replication, we have as many steps as the number of nodes minus one (Figure 2.)
Hierarchical master slave processing

Figure 1 - Hierarchical replication depth of processing

MySQL replication in action - Part 2 - Fan-in topology

Introduction: where we stand

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

In the latest releases of MySQL and MariaDB we have seen several replication improvements. One of the most exciting additions is the ability to enhance basic replication with multiple sources. Those who have used replication for a while should remember that one of the tenets of the “old” replication was that a slave couldn’t have more than one master. This was The Law and there was no escape ... until now. The only way to work around that prohibition was to use circular replication, also known as ring replication, where each node is slave of the previous node and master of the next one.
Circular replication

MySQL replication in action - Part 1: GTID & Co

In the theoretical part of this series, we have seen the basics of monitoring. In that article, though, we have barely mentioned the new tools available in MySQL 5.7 and MariaDB 10. Let’s start from something that has the potential of dramatically changing replication as we know it.

Crash-safe tables and Global transaction identifiers in MySQL 5.6 and 5.7

Global transaction identifiers (GTID) is a feature that has been in my wish list for long time, since the times I was working with the MySQL team. By the time I left Oracle, this feature was not even in the plans.
When MySQL 5.6 was first disclosed, the biggest improvement for replication was the introduction of crash-safe tables (see Status persistence in Monitoring 101.) There are two tables in the mysql database, named slave_master_info and slave_relay_log_info. At the beginning, these tables were using the MyISAM engine, thus defeating the purpose of making them crash-safe. In later versions, the developers decided to bite the bullet and create these tables with innodb from the beginning.
These two tables allow us to see the same information previously stored in the files master.info and relay_log.info. What makes these tables convenient is that they should survive a crash better than the standalone files.

MySQL::Sandbox 3.0.66 - improved usability and support for newest releases

The latest MySQL Sandbox, version 3.0.66 is out. It has a few new features (as always, when I find myself doing the same thing many times, I script it) and improved support for latest releases of MySQL. You can now install, among other versions, MySQL 5.7.8 and MariaDB 10.1.x

Some notable additions in this release are in the scripts that are created and customized for each sandbox. There are many of them and when one more arrives, it's easy to overlook it. So, here are the new arrivals.

Changed defaults between MySQL 5.6 and 5.7

MySQL 5.7 comes with many changes. Some of them are better explained than others.

I wanted to see how many changes I could get by comparing SHOW VARIABLES in MySQL 5.6 and 5.7.
The most notable ones are:

binlog_format: the default is now ROW. This variable affects the format of the binary log, whether you use it as a backup complement or for replication, the change means bigger binary logs and possibly side effects.
binlog_error_action now defaults to ABORT_SERVER. If the server cannot write to the binary log, rather than continuing its work without logging, it shuts down. This could be a desirable course of action, but better be prepared for the eventuality.
innodb_strict_mode is enabled by default, which is probably a good thing, but it means that previously accepted events will now generate an error instead than a warning.
sql_mode is now STRICT by default. While many well prepared users will be pleased with this change, which was advocated as best practice by some DBAs, the practical outcome is that several exiting applications may break because of unclean input.
sync_binlog, which affects data safety but also server performance is enabled.

In search of cleanliness : the elusive quiet installation

UPDATE: Almost solved! See at the end.

A clean installation of a database server is one where everything goes according to the expectations. It used to be easy: you only had to do what the manual says, and, presto! you would see your database server installed and ready to use. If something went wrong, you got one or more error messages that informed you of what needs to be fixed.

Sometimes, rarely, it happened that you got also a warning message, telling you that while the installation was successful, you could improve it by fine tuning this and that. No big deal.

Gone are those times. A clean installation nowadays is a much harder exercise, if not impossible. Let’s give it a try using MySQL 5.7.7.

Yet another MySQL 5.7 silent change

When a new version of MySQL appears, the first source of information for the brave experimenter is a page in the manual named What is new in MySQL X.X, also known as MySQL in a nutshell. For MySQL 5.7, the in-a-nutshell page lists quite a lot of changes. In that page, the list of removed features is enough to send a chill down the spine of most any DBA. Some of the items in the deprecation section are removals in disguise, as they require immediate, rather than delayed, action to use the new version with existing application (SET PASSWORD comes to mind immediately.)

In all that abundance of changes, there is at least one that may cause huge discomfort to many users:

mysql [localhost] {msandbox} (information_schema) > select @@version;  
+-----------+  
| @@version |  
+-----------+  
| 5.7.8-rc  |  
+-----------+  
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (information_schema) > select * from GLOBAL_STATUS;  
Empty set, 1 warning (0.00 sec)

mysql [localhost] {msandbox} (information_schema) > show warnings\G  
*************************** 1. row ***************************  
  Level: Warning  
   Code: 1287  
Message: 'INFORMATION_SCHEMA.GLOBAL_STATUS' is deprecated and will be removed in  
a future release. Please use performance_schema.global_status instead  
1 row in set (0.00 sec)

MySQL 5.7.8 : features, bugs and rumors

I’ve had a look at a preview release of MySQL 5.7.8, some time before it became available to the general public (perks and duties of an Oracle ACE) and I found a few interesting things among the release notes and the tarball itself:

There is a new tool named mysqlpump, which is intended as a replacement for mysqldump, with parallel processing, compression, progress watch, the long awaited ability of excluding databases or tables, and more.
The json functionality has been fished out from the labs and added to the main distribution.

I was initially pleased with this preview when it installed without errors in MySQL Sandbox, but my pleasure didn’t last long, as I soon stumbled upon a bug that affects replication. Until MySQL 5.7.7, a user with just REPLICATION SLAVE privileges can activate replication. In MySQL 5.7.8, though, replication fails with a baffling error:

slave1 [localhost] {msandbox} ((none)) > show slave status\G
[...]
            Last_IO_Errno: 1142
            Last_IO_Error: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: SELECT command denied to user 'rsandbox'@'localhost' for table 'global_variables'

Monday, August 31, 2015

Parallel replication overview

Tuesday, August 25, 2015

Monday, August 24, 2015

INSTALLATION

MULTI-SOURCE

Wednesday, August 19, 2015

Introducing star topology.

Monday, August 17, 2015

Introduction to P2P all-masters topology

Friday, August 14, 2015

Introduction: where we stand

Wednesday, August 12, 2015

Crash-safe tables and Global transaction identifiers in MySQL 5.6 and 5.7

Monday, August 10, 2015

Thursday, August 06, 2015

Wednesday, August 05, 2015

Tuesday, August 04, 2015

Monday, August 03, 2015

Pages

Giuseppe Maxia

Like my posts? Buy me a book!

Julian's force

Yes, I am one of those

Sapere, saper fare, fare, far sapere

Blog Archive

Planet MySQL