现象:
1、从mysqlbackup 的日志上来看是它一直处于state: Waiting for locks;
2、从mysql 层面show processlist 上看它的处于waiting for gloabl read lock
show processlist; +-----+-------------+---------------------+------+---------+--------+----------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+-------------+---------------------+------+---------+--------+----------------------------------+------------------+ | 331 | system user | | NULL | Connect | 215831 | Waiting for master to send event | NULL | | 332 | system user | | NULL | Connect | 252054 | Reading event from the relay log | NULL | | 336 | ha_op | 127.0.0.1:36246 | NULL | Sleep | 1 | | NULL | | 337 | ha_op | 127.0.0.1:36247 | NULL | Sleep | 1 | | NULL | | 339 | ha_op | 192.168.56.39:44672 | NULL | Sleep | 8816 | | NULL | | 340 | backup | 127.0.0.1:36267 | NULL | Sleep | 3954 | | NULL | | 344 | root | 127.0.0.1:36310 | NULL | Query | 0 | init | show processlist | | 345 | backup | 127.0.0.1:36337 | NULL | Sleep | 1134 | | NULL | +-----+-------------+---------------------+------+---------+--------+----------------------------------+------------------+
分析:
1、是不是有别的事务持有了X锁,使得备份用户的flush tables with read lock 一直等待
select * from information_schema.innodb_trx \G *************************** 1. row *************************** trx_id: 82557697 trx_state: RUNNING trx_started: 2016-08-28 13:53:11 trx_requested_lock_id: NULL trx_wait_started: NULL trx_weight: 153349 trx_mysql_thread_id: 332 trx_query: NULL trx_operation_state: starting index read trx_tables_in_use: 296 trx_tables_locked: 296 trx_lock_structs: 5759 trx_lock_memory_bytes: 538152 trx_rows_locked: 295679 trx_rows_modified: 147590 trx_concurrency_tickets: 0 trx_isolation_level: REPEATABLE READ trx_unique_checks: 1 trx_foreign_key_checks: 1 trx_last_foreign_key_error: NULL trx_adaptive_hash_latched: 0 trx_adaptive_hash_timeout: 10000 trx_is_read_only: 0 trx_autocommit_non_locking: 0 1 row in set (0.01 sec)
2、由(分析.1)可以看出是有一事务阻塞了备份用户下发的flush tables with read lock命令、 而且可以知道这个事务是332这个session发起的;
再结合(现象.2)可以知道332 是SQL进程
3、由(分析.2)已经知道是复制阻塞了备份,所以接下来是要去解析当前正slave 正在应用的binlog 用于更进一步的确定问题原因
show slave status 看到Relay_Master_Log_File: mysql-bin.000314、Exec_Master_Log_Pos: 216304949
4、解析binlog 后发现slave 在执行delete 操作
# at 216304997 #160826 14:49:02 server id 192694598 end_log_pos 216305071 CRC32 0x416f0b3c Query thread_id=16381 exec_time=0 error_code=0 SET TIMESTAMP=1472194142/*!*/; BEGIN /*!*/; # at 216305071 #160826 14:49:02 server id 192694598 end_log_pos 216305306 CRC32 0x19b5309b Table_map: `dcsdba`.`pre_merge_cdr_gsm` mapped to number 147 # at 216305306 #160826 14:49:02 server id 192694598 end_log_pos 216313252 CRC32 0xd8102d41 Delete_rows: table id 147 # at 216313252 #160826 14:49:02 server id 192694598 end_log_pos 216321207 CRC32 0xd73f1e3f Delete_rows: table id 147 # at 216321207 #160826 14:49:02 server id 192694598 end_log_pos 216329247 CRC32 0x6331d53b Delete_rows: table id 147 # at 216329247 #160826 14:49:02 server id 192694598 end_log_pos 216337393 CRC32 0xecd5041e Delete_rows: table id 147 # at 216337393 #160826 14:49:02 server id 192694598 end_log_pos 216345555 CRC32 0x6284a900 Delete_rows: table id 147
5、进一步确认问题原因是因为pre_merge_cdr_gsm表没有主键,在这种情况下slave 每删除一行都会对应一次全表扫描。
建议:
1、给pre_merge_cdr_gsm表加上一个合适的主键
时间: 2024-10-11 12:26:22