See Poor man’s MySQL replication monitoring and Geert's Refactored: Poor man’s MySQL replication monitoring.
Having Python in your server doesn't really qualify as "poor man". In many cases it's a luxury, and thus, here's my shot at the problem, using a Bash shell script.
Unlike its Python-based competition, this version also checks that the slave is replicating from the intended master, and that it is not lagging behind.
#!/bin/bash USERNAME=msandbox PASSWORD=msandbox EXPECTED_MASTER_HOST=127.0.0.1 EXPECTED_MASTER_PORT=27371 SLAVE_HOST=127.0.0.1 SLAVE_PORT=27372 MYSQL="mysql -u $USERNAME -p$PASSWORD " MASTER="$MYSQL -h $EXPECTED_MASTER_HOST -P $EXPECTED_MASTER_PORT" SLAVE="$MYSQL -h $SLAVE_HOST -P $SLAVE_PORT" $MASTER -e 'SHOW MASTER STATUS\G' > mstatus $SLAVE -e 'SHOW SLAVE STATUS\G' > sstatus function extract_value { FILENAME=$1 VAR=$2 grep -w $VAR $FILENAME | awk '{print $2}' } Master_Binlog=$(extract_value mstatus File ) Master_Position=$(extract_value mstatus Position ) Master_Host=$(extract_value sstatus Master_Host) Master_Port=$(extract_value sstatus Master_Port) Master_Log_File=$(extract_value sstatus Master_Log_File) Read_Master_Log_Pos=$(extract_value sstatus Read_Master_Log_Pos) Slave_IO_Running=$(extract_value sstatus Slave_IO_Running) Slave_SQL_Running=$(extract_value sstatus Slave_SQL_Running) ERROR_COUNT=0 if [ "$Master_Host" != "$EXPECTED_MASTER_HOST" ] then ERRORS[$ERROR_COUNT]="the slave is not replicating from the host that it is supposed to" ERROR_COUNT=$(($ERROR_COUNT+1)) fi if [ "$Master_Port" != "$EXPECTED_MASTER_PORT" ] then ERRORS[$ERROR_COUNT]="the slave is not replicating from the host that it is supposed to" ERROR_COUNT=$(($ERROR_COUNT+1)) fi if [ "$Master_Binlog" != "$Master_Log_File" ] then ERRORS[$ERROR_COUNT]="master binlog ($Master_Binlog) and Master_Log_File ($Master_Log_File) differ" ERROR_COUNT=$(($ERROR_COUNT+1)) fi POS_DIFFERENCE=$(echo ${Master_Position}-$Read_Master_Log_Pos|bc) if [ $POS_DIFFERENCE -gt 1000 ] then ERRORS[$ERROR_COUNT]="The slave is lagging behind of $POS_DIFFERENCE" ERROR_COUNT=$(($ERROR_COUNT+1)) fi if [ "$Slave_IO_Running" == "No" ] then ERRORS[$ERROR_COUNT]="Replication is stopped" ERROR_COUNT=$(($ERROR_COUNT+1)) fi if [ "$Slave_SQL_Running" == "No" ] then ERRORS[$ERROR_COUNT]="Replication (SQL) is stopped" ERROR_COUNT=$(($ERROR_COUNT+1)) fi if [ $ERROR_COUNT -gt 0 ] then EMAIL=myname@gmail.com SUBJECT="ERRORS in replication" BODY='' CNT=0 while [ "$CNT" != "$ERROR_COUNT" ] do BODY="$BODY ${ERRORS[$CNT]}" CNT=$(($CNT+1)) done echo $SUBJECT echo $BODY echo $BODY | mail -s "$SUBJECT" $EMAIL else echo "Replication OK" printf "file: %s at %'d\n" $Master_Log_File $Read_Master_Log_Pos fi
5 comments:
Thanks Giuseppe!
See you at the conference next week. Looking forward to seeing you again.
Thanks for this. I added "as on `date`" to all error messages to get the timestamp.
I reused the code above and added a locking file.
#!/bin/bash
USERNAME=msandbox
PASSWORD=msandbox
EXPECTED_MASTER_HOST=127.0.0.1
EXPECTED_MASTER_PORT=27371
SLAVE_HOST=127.0.0.1
SLAVE_PORT=27372
MYSQL="mysql -u $USERNAME -p$PASSWORD "
MASTER="$MYSQL -h $EXPECTED_MASTER_HOST -P $EXPECTED_MASTER_PORT"
SLAVE="$MYSQL -h $SLAVE_HOST -P $SLAVE_PORT"
$MASTER -e 'SHOW MASTER STATUS\G' > mstatus
$SLAVE -e 'SHOW SLAVE STATUS\G' > sstatus
function extract_value {
FILENAME=$1
VAR=$2
grep -w $VAR $FILENAME | awk '{print $2}'
}
Master_Binlog=$(extract_value mstatus File )
Master_Position=$(extract_value mstatus Position )
Master_Host=$(extract_value sstatus Master_Host)
Master_Port=$(extract_value sstatus Master_Port)
Master_Log_File=$(extract_value sstatus Master_Log_File)
Read_Master_Log_Pos=$(extract_value sstatus Read_Master_Log_Pos)
Slave_IO_Running=$(extract_value sstatus Slave_IO_Running)
Slave_SQL_Running=$(extract_value sstatus Slave_SQL_Running)
ERROR_COUNT=0
if [ "$Master_Host" != "$EXPECTED_MASTER_HOST" ]
then
ERRORS[$ERROR_COUNT]="the slave is not replicating from the host that it is supposed to"
ERROR_COUNT=$(($ERROR_COUNT+1))
fi
if [ "$Master_Port" != "$EXPECTED_MASTER_PORT" ]
then
ERRORS[$ERROR_COUNT]="the slave is not replicating from the host that it is supposed to"
ERROR_COUNT=$(($ERROR_COUNT+1))
fi
if [ "$Master_Binlog" != "$Master_Log_File" ]
then
ERRORS[$ERROR_COUNT]="master binlog ($Master_Binlog) and Master_Log_File ($Master_Log_File) differ"
ERROR_COUNT=$(($ERROR_COUNT+1))
fi
POS_DIFFERENCE=$(echo ${Master_Position}-$Read_Master_Log_Pos|bc)
if [ $POS_DIFFERENCE -gt 1000 ]
then
ERRORS[$ERROR_COUNT]="The slave is lagging behind of $POS_DIFFERENCE"
ERROR_COUNT=$(($ERROR_COUNT+1))
fi
if [ "$Slave_IO_Running" == "No" ]
then
ERRORS[$ERROR_COUNT]="Replication is stopped"
ERROR_COUNT=$(($ERROR_COUNT+1))
fi
if [ "$Slave_SQL_Running" == "No" ]
then
ERRORS[$ERROR_COUNT]="Replication (SQL) is stopped"
ERROR_COUNT=$(($ERROR_COUNT+1))
fi
if [ $ERROR_COUNT -gt 0 ]
then
EMAIL=myname@gmail.com
SUBJECT="ERRORS in replication"
BODY=''
CNT=0
while [ "$CNT" != "$ERROR_COUNT" ]
do
BODY="$BODY ${ERRORS[$CNT]}"
CNT=$(($CNT+1))
done
echo $SUBJECT
echo $BODY
echo $BODY | mail -s "$SUBJECT" $EMAIL
else
echo "Replication OK"
fi
...
This is a wonderful script, the fact this it is a pure bash script with no reliance on Python or any other packages makes it perfect for me.
I use ssmtp for message sending, so I'll make some slight modifications and repost on my own little blog (with attribution to the original author of course!)
Thanks!
Noveck
Thanks Giuseppe!
This script is working fine for me.
how can i configure to get get mail even though replication is working fine.
I want to run it through the cronjob and get the status of slave. running or not.
Kindly help me with this.
Post a Comment