当复制中断的时候,我们常用的方法是跳过错误,比如SET GLOBAL SQL_SLAVE_SKIP_COUNTER
=1或者直接slave-skip-errors=1062,这样确实解决了问题,恢复了复制。但是久而久之主从数据相差就很大了。对于复制正常以后,我们还需要使用pt-table-checksum数据效验,以及pt-table-sync同步数据,今天线上一个从库中断,查看了错误,发现是主键冲突,至于为何会发现这些情况,请阅读MySQL
Replication需要注意的问题
Last_Error: Error ‘Duplicate entry ‘192442‘ for key ‘PRIMARY‘‘ on query. Default database: ‘xxxxxxxxxxx‘. Query: ‘INSERT INTO xxxxxxxxxxxxx(playerId, `type`, `count`)
VALUES( NAME_CONST(‘pPlayerId‘,629014986), NAME_CONST(‘pType‘,8), 0)‘
Skip_Counter: 0
Exec_Master_Log_Pos: 1499475
Relay_Log_Space: 4829077512
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Error ‘Duplicate entry ‘192442‘ for key ‘PRIMARY‘‘ on query. Default database: ‘xxxxxxxxxxxxxx‘. Query: ‘INSERT INTO xxxxxxxxxxx(playerId, `type`, `count`)
VALUES( NAME_CONST(‘pPlayerId‘,629014986), NAME_CONST(‘pType‘,8), 0)‘
1 row in set (0.00 sec)
于是根据提示,以及查看表结构,发现果然是主键冲突了,但是表的记录却不一致,所以往往我们跳过错误就会导致主从数据不一致的问题。
主库上的记录:
mysql> select * from xxxx.xxxx where id=192442;
+--------+-----------+------+-------+-------+---------------+
| id | playerId | type | count | total | lastResetTime |
+--------+-----------+------+-------+-------+---------------+
| 192442 | 629014986 | 8 | 0 | 0 | 0 |
+--------+-----------+------+-------+-------+---------------+
1 row in set (0.00 sec)mysql>
从库上的记录:
mysql> select * from xxxx.xxxx where id=192442;
+--------+-----------+------+-------+-------+---------------+
| id | playerId | type | count | total | lastResetTime |
+--------+-----------+------+-------+-------+---------------+
| 192442 | 629015414 | 8 | 0 | 0 | 0 |
+--------+-----------+------+-------+-------+---------------+
1 row in set (0.00 sec)mysql>
有时候我会手动删除从库上提示的相应记录,但是往往是比较麻烦的。因为可能不止一条记录重复,可能N条,所以我就简单的写了一个脚本,删除主键冲突的记录,这个脚本只适合主键冲突的情况。
#!/bin/bash
#Delete duplicate records primary key conflict
#Write by yayun 2014-05-17mysql=/usr/local/mysql-5.1.66/bin/mysql
sock=/data/mysql-slave-3311/mysql.sock
passwd=123456while true
do
SQL_THREAD=`$mysql -uroot -p$passwd -S $sock -e ‘show slave status\G‘ | egrep ‘Slave_SQL_Running‘ | awk ‘{print $2}‘`
LAST_ERROR=`$mysql -uroot -p$passwd -S $sock -e ‘show slave status\G‘ | egrep Last_Errno | awk ‘{print $2}‘`
duplicate=`$mysql -uroot -p$passwd -S $sock -e ‘show slave status\G‘ | grep Last_Error | awk ‘/Duplicate entry/{print $5}‘ | awk -F "‘" ‘{print $2}‘`
DATABASE=`$mysql -uroot -p$passwd -S $sock -e ‘show slave status\G‘ | grep Last_Error | awk ‘{print $13}‘ | awk -F "‘" ‘{print $2}‘`
TABLE=`$mysql -uroot -p$passwd -S $sock -e ‘show slave status\G‘ | grep Last_Error | awk -F ":" ‘{print $4}‘ | awk -F "(" ‘{print $1}‘ | awk ‘{print $NF}‘`$mysql -uroot -p$passwd -S $sock -e ‘show slave status\G‘ | grep HA_ERR_FOUND_DUPP_KEY
if [ $? -eq 1 ]
then
if [ "$SQL_THREAD" == No ] && [ "$LAST_ERROR" == 1062 ]
then
FILED=`$mysql -uroot -p$passwd -S $sock -Nse "desc $DATABASE.$TABLE" | grep PRI | awk ‘{print $1}‘`
$mysql -uroot -p$passwd -S $sock -e "delete from $DATABASE.$TABLE where $FILED=$duplicate"
$mysql -uroot -p$passwd -S $sock -e "start slave sql_thread"
else
echo "====================== ok ========================"
$mysql -uroot -p$passwd -S $sock -e ‘show slave status\G‘ | egrep ‘Slave_.*_Running‘
echo "====================== ok ========================"
break
fi
fi
done
脚本随便写的。大家可以再自己相应的扩展一下。^_^