一条SQL引起的MySQL主从复制报错1205

大家都知道DBA就像是消(背)防(锅)员(侠),因为前端应用还有开发上线的新版本都会影响到位于最底层的数据库,前方稍微有些风吹草动,就能反应在数据库的性能上。但是有的时候SQL不仅能决定数据库的性能,还能决定数据库的生死,今天的案例是开发的一条SQL,引起的MySQL数据库主从复制报错,vip漂移问题。

今天下午有客户反馈发消息有点慢,手机收到几条报警:

nagios报警如下大致意思就是vip发生了漂移

**** Nagios ***** 

Notification Type: PROBLEM 

Service: check_ha_mysqld 
Host: prod-bjuc-mysql1-vip 
Address: 192.168.87.74 
State: CRITICAL 

Date/Time: Wed Mar 28 14:39:27 CST 2018 

Additional Info: 

CRITICAL: HA Failover from 192.168.87.72 to 192.168.87.123,         
192.168.87.123 result:PROCS OK: 1 processes with command name /usr/sbin/mysqld, args  VIP=192.168.87.74, 192.168.87.72:PROCS OK: 1 processes with command name /usr/sbin/mysqld
Slave_SQL线程停止工作
***** Nagios ***** 

Notification Type: PROBLEM 

Service: check_mysql_health_slave-sql-running 
Host: bjuc-mysql1 
Address: 192.168.87.72 
State: CRITICAL 

Date/Time: Wed Mar 28 14:46:06 CST 2018 

Additional Info: 

CRITICAL - Slave sql is not running

登录数据库服务器查看主从复制的状态,发现报错1205错

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.87.123
                  Master_User: zhu
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000481
          Read_Master_Log_Pos: 218802975
               Relay_Log_File: mysql-relay-bin.000428
                Relay_Log_Pos: 2428110
        Relay_Master_Log_File: mysql-bin.000481
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: performance_schema,information_schema,test,mysql
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: %.%
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1205
                   Last_Error: Error 'Lock wait timeout exceeded; try restarting transaction' on query. 
                               Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 182088298
              Relay_Log_Space: 39142986
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1205
               Last_SQL_Error: Error 'Lock wait timeout exceeded; try restarting transaction' on query. 
                               Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)'
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 6
1 row in set (0.00 sec)

查看错误日志有一下报错信息:

错误日志中给出了报错原因和解决方法

tail -70 error.log

180328 14:36:05 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:36:21 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:36:38 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:36:56 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:37:15 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:37:35 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:37:56 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:38:17 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:38:38 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:38:59 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:39:20 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205
180328 14:39:20 [ERROR] Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
180328 14:39:20 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
180328 14:39:20 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000481' position 182088298

查看参数结合着两个参数可以知道主从复制报错的原因为:

180328 14:39:20 [ERROR] Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.

180328 14:39:20 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205

180328 14:39:20 [ERROR] Error running query, slave SQL thread aborted,Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000481' position 182088298

因为表notice_status_new产生了锁,阻塞了这条update语句

参数innodb_lock_wait_time 设置的为15秒,单个事务在等待15秒后开始报1025错:因为锁执行超时并重启事务

参数slave_transaction_retries 设置的为10次,如果事务重试次数超过10次,复制中断。(但是错误日志中尝试了11次)

mysql> show variables like '%innodb_lock_wait_timeout%';

+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 15    |
+--------------------------+-------+
1 row in set (0.01 sec)

mysql> show variables like '%slave_transaction_retries%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| slave_transaction_retries | 10    |
+---------------------------+-------+
1 row in set (0.00 sec)

解决方法:重启slave后主从复制恢复,并开始追日志

mysql> stop slave;
Query OK, 0 rows affected (0.02 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.87.123
                  Master_User: zhu
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000481
          Read_Master_Log_Pos: 451125456
               Relay_Log_File: mysql-relay-bin.000428
                Relay_Log_Pos: 9541548
        Relay_Master_Log_File: mysql-bin.000481
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: performance_schema,information_schema,test,mysql
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: %.%
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 189201736
              Relay_Log_Space: 271465769
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 5184
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 6
1 row in set (0.00 sec)

vip的问题等到晚上在切回来,线上架构为MM+heartbeat

---------------------------------我是分割线--------------------------

这次问题虽然解决了,但是发现问题我们一定要找到根音,这样问题才能算是闭环。

为什么会产生这个锁呢,因为记录锁信息的表都是实时的,现在查不出当时的锁信息,但是从存储引擎的运行状态中还是能发现一些问题,数据库出现了死锁!!显然在update这张表一定会等待的!!

发现了一条奇葩SQL,被锁的那张表也是notice_status_new

UPDATE notice_status_new SET count = count + 1, push_count = push_count + 1 WHERE conversation = 2609759 AND user_id != 7384662 AND user_id IN(7359498)

show engine innodb status\G 
截取死锁的这部分
------------------------
LATEST DETECTED DEADLOCK
------------------------
180328 14:33:52
*** (1) TRANSACTION:
TRANSACTION 1B285E9F9, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 1
MySQL thread id 3972036, OS thread handle 0x7fc0c3464700, query id 2525495433 192.168.87.42 dstatusnet Updating
UPDATE notice_status_new SET count = count + 1, push_count = push_count + 1 WHERE conversation = 2609759 AND user_id != 7384662 AND user_id IN(7359498)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_user` of table `statusnet`.`notice_status_new` trx id 1B285E9F9 lock_mode X locks rec but not gap waiting
Record lock, heap no 241 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 000000000027d25f; asc      ' _;;
 1: len 4; hex 80704c0a; asc  pL ;;
 2: len 4; hex 80b1a0fe; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 1B285E9FD, ACTIVE 0 sec starting index read, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 1
MySQL thread id 3867908, OS thread handle 0x7fbfc6d4f700, query id 2525495437 192.168.87.42 dstatusnet Updating
UPDATE notice_status_new SET count = count + 1, push_count = push_count + 1 WHERE conversation = 2609759 AND user_id != 7359498 AND user_id IN(7384662)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_user` of table `statusnet`.`notice_status_new` trx id 1B285E9FD lock_mode X locks rec but not gap
Record lock, heap no 241 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 000000000027d25f; asc      ' _;;
 1: len 4; hex 80704c0a; asc  pL ;;
 2: len 4; hex 80b1a0fe; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_user` of table `statusnet`.`notice_status_new` trx id 1B285E9FD lock_mode X locks rec but not gap waiting
Record lock, heap no 242 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 000000000027d25f; asc      ' _;;
 1: len 4; hex 8070ae56; asc  p V;;
 2: len 4; hex 80b1a0fc; asc     ;;

*** WE ROLL BACK TRANSACTION (2)

问题找到了,找对该业务的开发,沟通后决定在周六上线新版本,彻底解决问题。

可以写一个实时监控锁的脚本,把内容输出打印,这样方便以后定位问题!

原文地址:http://blog.51cto.com/11784929/2092097

时间: 2024-10-12 13:29:02

一条SQL引起的MySQL主从复制报错1205的相关文章

配置MySQL主从复制报错Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work

配置MySQL主从复制报错 Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this do

mysql主从复制报错 :Incorrect usage of DB GRANT and GLOBAL PRIVILEGES

在配置mysql主从复制时,想通过 grant replication slave on bbs.* to 'bbs'@'192.168.1.3' identified by '123456'; 来限制主从复制只作用于bbs数据库,但是上面的语句会报错:Incorrect usage of DB GRANT and GLOBAL PRIVILEGES 因为replication slave 的级别是global,所以不能只作用于某一数据库,而是全局,如下图说明: 所以还是要通过 grant re

mysql主从复制报错

Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work. 删除  /var/lib/mysql/auto.cnf ,因为在auto.cnf中的server-uuid是一样的 自动判断 中文 中文(简体) 中文(香港) 中文(繁体) 英语 日语 朝鲜语 德语

插入mysql语句报错:1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near

插入一个很简单的sql语句时候,mysql一直报错: [SQL] INSERT INTO ORDER ( id, activity_id, order_type, phone, order_amount, order_state, pay_type)VALUES ( '4', '2121313', '1', '13552444989', '1', '1', '1' ) [Err] 1064 - You have an error in your SQL syntax; check the man

MySQL server has gone away(在执行sql的时候,莫名的报错)

原文:https://cenalulu.github.io/mysql/mysql-has-gone-away/ MySQL Server has gone away报错原因汇总分析 原因1. MySQL 服务宕了 原因2. 连接超时 原因3. 进程在server端被主动kill 原因4. Your SQL statement was too large. 本文将总结和汇总MySQL Server has gone away这类报错发生的原因 背景:在平时和开发的交流 以及 在论坛回答问题的或称

mysql插入报错:java.sql.SQLException: Incorrect string value: '\xE6\x9D\xAD\xE5\xB7\x9E...' for column 'address' at row 1

界面报错: 日志报错: java.sql.SQLException: Incorrect string value: '\xE6\x9D\xAD\xE5\xB7\x9E...' for column 'address' at row 1 at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129) ~[mysql-connector-java-8.0.17.jar:8.0.17] at com.mys

PHP多次调用Mysql存储过程报错解决办法

PHP多次调用Mysql数据库的存储过程会出现问题,主要问题为存储过程中执行多次SQL语句不能一一释放导致的,网上找了一些解决办法,比如使用 multi_query 然后一个一个释放,但是发现根本不适合我们的项目,我们使用CI框架写的,更多的是使用CI的数据库处理方法.所以只能另辟蹊径. 一次偶然,把Mysql链接方式改成了mysqli,两种不同的PHP连接mysql的扩展,官方在高版本中推荐使用mysqli,结果却奇迹般好了,使用Mysql长连接也行,天意么? PHP多次调用Mysql存储过程

【Mysql】—— 报错:Can't call commit when autocommit=true

java.sql.SQLException: Can't call commit when autocommit=true at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931) at com.mysql.jdbc.ConnectionImpl.commit(ConnectionImpl.jav

mysql导入报错 ERROR 1286 (42000) at line 1: Unknown table engine 'InnoDB'

[[email protected] mysql]# mysql -u root -p'123456'  jiang < schema.sql                 ERROR 1286 (42000) at line 1: Unknown table engine 'InnoDB' mysql> select * from ENGINES; +------------+---------+-----------------------------------------------