Monday, February 14, 2011

How to detect if a MySQL server is an active replication slave

Sometimes you know for sure. And sometimes you wonder: Is this server part of a replication system? And, most specifically, is it an active slave?
The completeness of the answer depends on how much visibility you have on the server.
If you can ask the DBA, and possibly have access to the server data directory and configuration file, you can get a satisfactory answer. But if your access is limited to SQL access, things get a bit more complicated.
If you have the SUPER or REPLICATION_CLIENT privilege, then it's easy, at least in the surface.
SHOW SLAVE STATUS will tell you if the slave is running. An empty set means that the server was not configured as a slave.
The answer is not absolute, though. You need to read the output of SHOW SLAVE STATUS to understand if replication is under way.
For example, what is the difference between these two listings?

## listing 1
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: QA1
                  Master_User: tungsten_slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: QA2-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 106
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
## Listing 2
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: QA1
                  Master_User: test
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: QA2-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 125
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
They look almost the same, and yet the similarity is deceiving. The first listing is what you get immediately after a call to CHANGE MASTER TO. If you run START SLAVE at this point, replication will start flowing.
The second listing is what you get immediately after a call to RESET SLAVE. The crucial difference is that RESET SLAVE removes the two .info files containing replication credentials and positions. A call to START SLAVE in this scenario will only get you an error, as the slave does not know where and how to connect.
So, in this case, SQL visibility does only tell you that the server is not receiving replication date, and that it was at least once configured as a slave. The telltale detail is the user name ("test") that should give you a hint of something fishy going on. Unless you have called your user "test", in which case you were asking for trouble. I would say that this situation is a bug. RESET SLAVE should remove every memory of the slave configuration, and instead it keeps only the host name. Although it is not clear in this particular example, it also forgets the master connection port.

Now, if your purpose was to set replication with different coordinates, the good news is that in both cases a well formed call (*) to CHANGE MASTER TO will do what you expect, i.e. it will establish the credentials to the master, so that a further invocation of START SLAVE will let replication data flow.

(*) By "well formed" I mean a call that includes host, port, username, password, binary log file and position, and eventually all the information that you need to get the slave at work.

No comments: