[PATCH] MDEV-21322: Report slave progress to the master
From: Brandon Nesterenko <brandon.nesterenko@mariadb.com> This patch extends the command `SHOW REPLICA HOSTS` with three columns: 1) Gtid_State_Sent. This represents that latest GTIDs sent to the replica in each domain. It will always be populated, regardless of the semi-sync status (i.e. asynchronous connections will still update this column with the latest GTID state sent to the replica). 2) Gtid_State_Ack. For semi-synchronous connections (only), this column represents the last GTID in each domain that the replica has acknowledged. 3) Sync_Status. This value represents the synchronization status of the replica, and is used to help determine how to interpret the Gtid_State_Ack column. There are four possible values: 3.1) Initializing. This means the binlog dump thread is still initializing, and has not yet determined the synchronization status of the replica. 3.2) Asynchronous: This means the replica is not configured for semi-sync replication, and thereby, Gtid_State_Ack should always be empty. 3.3) Semi-sync Stale: This means the replica is configured for semi-sync replication, however, connected using an old state, and is not readily able to send ACKs for new transactions. Functionally, this means that the primary will try to catch the replica up-to-date by sending transactions which will not be ACKed. Additionally, the value shown by Gtid_State_Ack will be empty until the replica catches up and ACKs its first transaction. 3.4) Semi-sync Active: This means the replica is configured for semi-sync replication, and is readily sending ACKs for new transactions it receives. It is possible for Gtid_State_Ack to be empty while Sync_Status is "Semi-sync Active" if no new transactions have been executed on the primary since the replica has connected. Additionally, this patch creates a new semantic for the configuration rpl_semi_sync_master_timeout=0. That is, now when 0, 1) new transactions will not attempt to wait for an ACK before completing, and 2) the primary will still request ACKs from the replica for new transactions. This means that Gtid_State_Ack will be updated for each ACK from the replica and Sync_Status will read as "Semi-sync Active". Effectively, this creates a mode to mimic the asynchronous connection behavior, while allowing one to monitor the progress at which the primary is sending transactions to the replica via the new columns Gtid_State_Sent and Gtid_State_Ack. Also note that a new error message was added to account for the case that Gtid_State_(Sent/Ack) represents a binary log file that was purged/cannot be found. The overall implementation is rather simple. It leverages the existing semi-sync framework, where the replica uses binlog file:pos to ACK transactions, in order to infer GTID state by performing a binlog lookup at the time `SHOW REPLICA HOSTS` is executed. In particular, the Slave_info struct is extended to store 1) the binlog file:pos pair of the transaction which was last sent to the replica, 2) the binlog file:pos pair that was last ACKed by the replica, and 3) and enum to represent the Sync_Status. This patch was initially started by @JackSlateur in PR#1427, where it was then transferred to @an3l who buffed it out in PR#2374, and final touches were put on by @bnestere. Reviewed By: ============ <TODO> --- mysql-test/main/grant_master_admin.result | 2 +- .../suite/rpl/r/rpl_fail_register.result | 2 +- .../suite/rpl/r/rpl_mixed_ddl_dml.result | 4 +- .../suite/rpl/r/rpl_show_slave_hosts.result | 516 ++++++++++- .../suite/rpl/t/rpl_show_slave_hosts.cnf | 9 +- .../suite/rpl/t/rpl_show_slave_hosts.test | 808 +++++++++++++++++- sql/repl_failsafe.cc | 81 +- sql/semisync_master.cc | 58 +- sql/semisync_master.h | 93 +- sql/semisync_master_ack_receiver.cc | 16 +- sql/share/errmsg-utf8.txt | 2 + sql/slave.cc | 47 +- sql/sql_repl.cc | 66 +- sql/sql_repl.h | 3 +- 14 files changed, 1616 insertions(+), 91 deletions(-) diff --git a/mysql-test/main/grant_master_admin.result b/mysql-test/main/grant_master_admin.result index bd08ade940c..97a7b4d0024 100644 --- a/mysql-test/main/grant_master_admin.result +++ b/mysql-test/main/grant_master_admin.result @@ -28,7 +28,7 @@ GRANT REPLICATION MASTER ADMIN ON *.* TO `user1`@`localhost` connect con1,localhost,user1,,; connection con1; SHOW SLAVE HOSTS; -Server_id Host Port Master_id +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status connection default; DROP USER user1@localhost; # diff --git a/mysql-test/suite/rpl/r/rpl_fail_register.result b/mysql-test/suite/rpl/r/rpl_fail_register.result index 0398220c4d0..7af07b335b3 100644 --- a/mysql-test/suite/rpl/r/rpl_fail_register.result +++ b/mysql-test/suite/rpl/r/rpl_fail_register.result @@ -14,7 +14,7 @@ set global debug_dbug=@old_dbug; connection master; kill DUMP_THREAD; show slave hosts; -Server_id Host Port Master_id +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status connection slave; start slave; include/rpl_end.inc diff --git a/mysql-test/suite/rpl/r/rpl_mixed_ddl_dml.result b/mysql-test/suite/rpl/r/rpl_mixed_ddl_dml.result index 0cee79434ee..ec1b8d46824 100644 --- a/mysql-test/suite/rpl/r/rpl_mixed_ddl_dml.result +++ b/mysql-test/suite/rpl/r/rpl_mixed_ddl_dml.result @@ -11,8 +11,8 @@ n 2002 connection master; show slave hosts; -Server_id Host Port Master_id -2 127.0.0.1 9999 1 +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 127.0.0.1 9999 1 0-1-2 Asynchronous drop table t1; connection slave; stop slave; diff --git a/mysql-test/suite/rpl/r/rpl_show_slave_hosts.result b/mysql-test/suite/rpl/r/rpl_show_slave_hosts.result index 0c8903378a7..34a5450ed82 100644 --- a/mysql-test/suite/rpl/r/rpl_show_slave_hosts.result +++ b/mysql-test/suite/rpl/r/rpl_show_slave_hosts.result @@ -1,20 +1,504 @@ -include/master-slave.inc -[connection master] -connect slave2,127.0.0.1,root,,test,$SLAVE_MYPORT2,; -connection slave2; -RESET SLAVE; -CHANGE MASTER TO master_host='127.0.0.1',master_port=MASTER_PORT,master_user='root', master_ssl_verify_server_cert=0; -START SLAVE IO_THREAD; -include/wait_for_slave_io_to_start.inc -connection master; +include/rpl_init.inc [topology=1->2,1->3] +connection server_1; SHOW SLAVE HOSTS; -Server_id Host Port Master_id -3 slave2 SLAVE_PORT 1 -2 localhost SLAVE_PORT 1 -connection slave2; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 Asynchronous +3 slave2 SLAVE2_PORT 1 Asynchronous +connection server_3; include/stop_slave_io.inc -connection master; +connection server_1; SHOW SLAVE HOSTS; -Server_id Host Port Master_id -2 localhost SLAVE_PORT 1 +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 Asynchronous +# +# MDEV-21322: report slave progress to the primary +# +# +# 21322.0: Test case set-up +# +connection server_1; +set sql_log_bin=0; +call mtr.add_suppression("Got an error reading communication packets"); +call mtr.add_suppression("Semi-sync master failed on net_flush"); +call mtr.add_suppression("Could not read packet:.* vio_errno: 1158"); +call mtr.add_suppression("Could not write packet:.* vio_errno: 1160"); +set sql_log_bin=1; +set @save_primary_dbug= @@global.debug_dbug; +set @save_semisync_timeout= @@global.rpl_semi_sync_master_timeout; +set @save_semisync_master_enabled= @@global.rpl_semi_sync_master_enabled; +create table t1 (a int); +connection server_2; +set @save_s2_debug= @@GLOBAL.debug_dbug; +set @save_semisync_server_2_enabled= @@global.rpl_semi_sync_master_enabled; +set sql_log_bin=0; +call mtr.add_suppression('Slave I/O: Relay log write failure: could not queue event from master.*'); +call mtr.add_suppression('Slave I/O: Replication event checksum verification failed while reading from network.*'); +call mtr.add_suppression('Replication event checksum verification failed'); +call mtr.add_suppression("Timeout waiting for reply of binlog*"); +call mtr.add_suppression('Found invalid event in binary log'); +call mtr.add_suppression('event read from binlog did not pass crc check'); +call mtr.add_suppression('Event crc check failed! Most likely there is event corruption'); +call mtr.add_suppression('Slave SQL: Error initializing relay log position: I/O error reading event at position .*, error.* 1593'); +call mtr.add_suppression("Semi-sync slave .* reply"); +set sql_log_bin=1; +connection server_3; +set @save_s3_debug= @@GLOBAL.debug_dbug; +set @save_semisync_server_3_enabled= @@global.rpl_semi_sync_master_enabled; +set sql_log_bin=0; +call mtr.add_suppression('Slave I/O: Relay log write failure: could not queue event from master.*'); +call mtr.add_suppression('Slave I/O: Replication event checksum verification failed while reading from network.*'); +call mtr.add_suppression('Replication event checksum verification failed'); +call mtr.add_suppression("Timeout waiting for reply of binlog*"); +call mtr.add_suppression('Found invalid event in binary log'); +call mtr.add_suppression('event read from binlog did not pass crc check'); +call mtr.add_suppression('Event crc check failed! Most likely there is event corruption'); +call mtr.add_suppression('Slave SQL: Error initializing relay log position: I/O error reading event at position .*, error.* 1593'); +call mtr.add_suppression("Semi-sync slave .* reply"); +set sql_log_bin=1; +connection server_1; +include/save_master_gtid.inc +connection server_2; +include/sync_with_master_gtid.inc +# +# 21322.1: In a fresh replication state with semi-sync disabled, +# the Sync_Status column should reflect an asynchronous replication +# state, and Gtid_State_Sent/Ack should start, and only Gtid_State_Sent +# should update with new transaction. Note only server_2 is currently +# connected. +# +connection server_1; +SHOW SLAVE HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 0-1-1 Asynchronous +insert into t1 values (1); +include/save_master_gtid.inc +connection server_2; +include/sync_with_master_gtid.inc +connection server_1; +# Gtid_State_Sent should be updated for new transaction +SHOW SLAVE HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 0-1-2 Asynchronous +# +# 21322.2: When only the primary enables semi-sync, Sync_Status should +# still be asynchronous +# +connection server_1; +set global rpl_semi_sync_master_enabled= 1; +show variables like 'rpl_semi_sync_master_enabled'; +Variable_name Value +rpl_semi_sync_master_enabled ON +SHOW SLAVE HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 0-1-2 Asynchronous +# +# 21322.3: Finalizing the semi-sync connection on server_2 (i.e. by +# enabling it on the slave) should update Sync_Status to semi-sync +# active, as the slave is up-to-date. +# +connection server_2; +include/stop_slave.inc +set global rpl_semi_sync_slave_enabled = 1; +include/start_slave.inc +connection server_1; +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 Semi-sync Active +# +# 21322.4: After new semi-sync transactions are ACKed, +# Gtid_State_Sent/Ack should match gtid_binlog_pos, and Sync_Status +# should read that semi-sync is active +# +connection server_1; +insert into t1 values (2); +include/save_master_gtid.inc +connection server_2; +include/sync_with_master_gtid.inc +connection server_1; +# Ensuring master gtid_binlog_pos matches Gtid_State_Sent +# Ensuring master gtid_binlog_pos matches Gtid_State_Ack +# Ensuring Sync_Status is semi-sync active +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 0-1-3 0-1-3 Semi-sync Active +# +# 21322.5: When connecting a new slave (server_id 3) which initially has +# semi-sync disabled, SHOW SLAVE HOSTS on the master should show its +# Sync_Status as asynchronous (while server_id 2 is still semi-sync +# active). +# +connection server_3; +include/start_slave.inc +include/sync_with_master_gtid.inc +connection server_1; +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 0-1-3 Asynchronous +2 localhost SLAVE_PORT 1 0-1-3 0-1-3 Semi-sync Active +# +# 21322.6: Reconnecting server_3 as a semi-sync enabled replica should +# result in a Sync_Status reflecting active semi-sync +# +connection server_3; +include/stop_slave.inc +set global rpl_semi_sync_slave_enabled = 1; +include/start_slave.inc +connection server_1; +show status like 'Rpl_semi_sync_master_clients'; +Variable_name Value +Rpl_semi_sync_master_clients 2 +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 Semi-sync Active +2 localhost SLAVE_PORT 1 0-1-3 0-1-3 Semi-sync Active +# +# 21322.7: New transactions on the master should update Gtid_State_Sent +# when sent to the slave, and Gtid_State_Ack once receiving an ACK +# +connection server_2; +connection server_1; +SET @@GLOBAL.debug_dbug="+d,pause_ack_thread_on_next_ack"; +connection default; +insert into t1 values (3); +connection server_1; +# waiting for pause_ack_reply_to_binlog +SET debug_sync='now WAIT_FOR pause_ack_reply_to_binlog'; +# Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +# Ensuring Gtid_State_Ack is not yet updated (as ACK thread is paused) +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 0-1-4 Semi-sync Active +2 localhost SLAVE_PORT 1 0-1-4 0-1-3 Semi-sync Active +connection server_1; +SET debug_sync='now SIGNAL resume_ack_thread'; +connection default; +connection server_1; +# Waiting for Gtid_State_Ack to reflect latest transaction for all replicas.. +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 0-1-4 0-1-4 Semi-sync Active +2 localhost SLAVE_PORT 1 0-1-4 0-1-4 Semi-sync Active +# Reset debug state +SET @@GLOBAL.debug_dbug= @save_primary_dbug; +SET debug_sync='RESET'; +# +# 21322.8: Holding one replica's ACK (server_2) should result in +# Gtid_State_Ack of server_3 updating to the most recent GTID, while +# server_id 2 has the old GTID. Note that we need to use debug_sync to +# synchronize the ACKs of both server_2 and server_3, so server_3 can't +# ACK the transaction before server_2's binlog dump thread sends the +# transaction (which would negate the need for server_2 to ACK at all, +# resulting in MTR hanging on its expected debug_sync WAIT_FOR point.) +# +connection server_2; +include/stop_slave.inc +set @@GLOBAL.debug_dbug="+d,synchronize_semisync_slave_reply"; +include/start_slave.inc +connection server_3; +include/stop_slave.inc +set @@GLOBAL.debug_dbug="+d,synchronize_semisync_slave_reply"; +include/start_slave.inc +connection server_1; +# Waiting for master to recognize slave restart.. +insert into t1 values (4); +connection server_2; +set debug_sync= "now WAIT_FOR at_slave_reply"; +connection server_3; +set debug_sync= "now WAIT_FOR at_slave_reply"; +connection default; +# Ensure Gtid_State_Sent reflects latest transaction (0-1-5) for all replicas.. +connection server_3; +set debug_sync= "now SIGNAL reply_ack_to_master"; +connection server_1; +# Wait for Gtid_State_Ack to show the latest transaction for server_3.. +# Only server_3 should have ACKed the new GTID, server_2 should not due to debug_sync holding off the ACK +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 0-1-5 0-1-5 Semi-sync Active +2 localhost SLAVE_PORT 1 0-1-5 Semi-sync Active +connection server_2; +# Resume slave so it can ACK the transaction +set debug_sync= "now SIGNAL reply_ack_to_master"; +# Waiting for Gtid_State_Ack to reflect latest transaction for all replicas.. +connection server_1; +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 0-1-5 0-1-5 Semi-sync Active +2 localhost SLAVE_PORT 1 0-1-5 0-1-5 Semi-sync Active +connection server_2; +include/stop_slave.inc +set @@GLOBAL.debug_dbug= @save_s2_debug; +SET debug_sync='RESET'; +include/start_slave.inc +connection server_3; +include/stop_slave.inc +set @@GLOBAL.debug_dbug= @save_s3_debug; +SET debug_sync='RESET'; +include/start_slave.inc +connection server_1; +include/save_master_gtid.inc +connection server_2; +include/sync_with_master_gtid.inc +connection server_3; +include/sync_with_master_gtid.inc +# +# 21322.9: If a server is behind when connecting to a primary (i.e. the +# primary has newer transactions), the replica Sync_Status should +# present as semi-sync stale and the Gtid_State_Ack should not populate +# until it has reached Semi-Sync Active +# +connection server_2; +include/stop_slave.inc +connection server_1; +insert into t1 values (5); +include/save_master_gtid.inc +# Pause dump_thread of server_2 (server_3 won't be affected as it has +# already successfully ACKed the new transaction) +set @@global.debug_dbug= "+d,pause_dump_thread_after_sending_next_full_trx"; +connection server_2; +include/start_slave.inc +connection server_1; +set debug_sync= 'now WAIT_FOR dump_thread_paused'; +# Ensure Gtid_State_Sent is updated to represent new transaction has +# been sent to both replicas +# Ensure Sync_Status is Semi-sync Stale for the debug_sync held dump +# thread, as it hasn't yet got "up-to-date" +set debug_sync= 'now SIGNAL dump_thread_continue'; +# Ensure Sync_Status will automatically update to Semi-sync Active +# once the last stale transaction has finished sending +set @@global.debug_dbug= @save_primary_dbug; +SET debug_sync='RESET'; +include/save_master_gtid.inc +connection server_2; +include/sync_with_master_gtid.inc +connection server_3; +include/sync_with_master_gtid.inc +# +# 21322.10a: If one replica errors (i.e. server_2 by injecting +# corrupt_gtid_event with debug_dbug), it shouldn't send its ACK, +# and server_3 does ACK; then Gtid_State_Sent should still reflect the +# new transaction for each replica, but Gtid_State_Ack should only be +# updated by the successful transaction. When the errored replica +# reconnects, it shouldn't ACK the transaction, so its on-reconnect +# Gtid_State_Ack value should be empty, but still have a Sync_Status +# of "Semi-sync Active" when it receives the latest transaction, as it +# will be ready to ACK new transactions going forward. +# +connection server_2; +include/stop_slave.inc +SET @@GLOBAL.debug_dbug= "+d,corrupt_gtid_event"; +include/start_slave.inc +# Set-up server_3 for 10b (so we don't have to restart the slave) +connection server_3; +include/stop_slave.inc +set @@GLOBAL.debug_dbug="+d,synchronize_semisync_slave_reply"; +include/start_slave.inc +connection server_1; +# Waiting for master to recognize slave restarts.. +connection server_1; +insert into t1 values (6); +# Debug_sync is irrelevant to this testcase (10a) but we must do it to +# allow server_3 to ACK now (Note debug_sync is needed for 10b) +connection server_3; +set debug_sync= "now WAIT_FOR at_slave_reply"; +set debug_sync= "now SIGNAL reply_ack_to_master"; +connection server_1; +include/save_master_gtid.inc +# Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +connection server_2; +include/wait_for_slave_io_error.inc [errno=1595] +set @@GLOBAL.debug_dbug= @save_s2_debug; +connection server_1; +# Only server_3 should ACKed have the new GTID, server_2 should not due to corrupt_queue_event +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 0-1-7 0-1-7 Semi-sync Active +2 localhost SLAVE_PORT 1 0-1-7 Semi-sync Active +connection server_2; +include/start_slave.inc +include/sync_with_master_gtid.inc +connection server_1; +# With replica restarted/synced, its Gtid_State_Ack should be empty with Sync_Status semi-sync active +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 0-1-7 Semi-sync Active +3 slave2 SLAVE2_PORT 1 0-1-7 0-1-7 Semi-sync Active +# +# 21322.10b: Succeeding the previous 10a test, if server_3 now stalls +# (i.e. using debug_sync), then the previously errored server_2 should +# receive and ACK new transactions as a "lone" replica, and update its +# Gtid_State_* columns appropriately, whereas server_3's Gtid_State_Ack +# column should not be updated. +# +insert into t1 values (7); +include/save_master_gtid.inc +connection server_2; +include/sync_with_master_gtid.inc +# Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +connection server_3; +set debug_sync= "now WAIT_FOR at_slave_reply"; +connection server_1; +# Only server_2 should have ACKed the new GTID ACKed, server_3 should not due to stall +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +2 localhost SLAVE_PORT 1 0-1-8 0-1-8 Semi-sync Active +3 slave2 SLAVE2_PORT 1 0-1-8 0-1-7 Semi-sync Active +# Resume server_3 +connection server_3; +set debug_sync= "now SIGNAL reply_ack_to_master"; +include/sync_with_master_gtid.inc +SET debug_sync='RESET'; +include/stop_slave.inc +set @@GLOBAL.debug_dbug= @save_s3_debug; +SET debug_sync='RESET'; +include/start_slave.inc +# +# 21322.11: Configuration rpl_semi_sync_master_timeout of 0 should +# have transaction behavior match asynchronous behavior (i.e. trxs don't +# need to wait for ACKs), yet the slave should still send ACKs as a +# normal semi-sync replica, and Gtid_State_Ack should still be updated +# accordingly. +# +# The actual behavior tested in this case is as follows: +# a) Transactions won't await ACKs to complete +# b) Semi-sync remains ON when a transaction completes without an ACK +# c) Gtid_State_Ack is updated accordingly for each replica's ACK (even +# when it is behind). Here, we hold both replicas using DEBUG_SYNC to +# not send their ACKs, meanwhile, we continue creating transactions +# on the primary. +# d) If a "very lagged" replica sends an ACK for a transaction from a +# purged binlog, the Gtid_State_Ack value should be cleared, and +# issue a warning to the user with the slave's last ACKed binlog +# coordinate (i.e. filename and position). +# +connection server_1; +set global rpl_semi_sync_master_timeout=0; +connection server_2; +include/stop_slave.inc +SET @@GLOBAL.DEBUG_DBUG="+d,synchronize_semisync_slave_reply"; +include/start_slave.inc +connection server_3; +include/stop_slave.inc +SET @@GLOBAL.DEBUG_DBUG="+d,synchronize_semisync_slave_reply"; +include/start_slave.inc +connection server_1; +# Waiting for master to recognize slave restarts.. +# +# 21322.11.a +connection server_1; +insert into t1 values (8); +# Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +connection server_2; +SET debug_sync='now WAIT_FOR at_slave_reply'; +connection server_3; +SET debug_sync='now WAIT_FOR at_slave_reply'; +# Gtid_State_Ack should be empty for both replicas (as they were restarted).. +# +# 21322.11.b +connection server_1; +# Ensuring semi-sync status on primary is correct.. +# +# 21322.11.c +connection server_1; +insert into t1 values (9); +include/save_master_gtid.inc +# Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +connection server_1; +# server_2 and 3 should both show an empty ACK state +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 TRX2_GTID Semi-sync Active +2 localhost SLAVE_PORT 1 TRX2_GTID Semi-sync Active +# Let server_2 ACK just the first transaction +connection server_2; +set debug_sync= "now SIGNAL reply_ack_to_master"; +set debug_sync= "now WAIT_FOR at_slave_reply"; +connection server_1; +# Waiting for server_2 Gtid_State_Ack to reflect first transaction +# Let server_2 ACK the second transaction +connection server_2; +set debug_sync= "now SIGNAL reply_ack_to_master"; +include/sync_with_master_gtid.inc +connection server_1; +# Waiting for server_2 Gtid_State_Ack to reflect second transaction +# Let server_3 now ACK the first transaction +connection server_3; +set debug_sync= "now SIGNAL reply_ack_to_master"; +set debug_sync= "now WAIT_FOR at_slave_reply"; +connection server_1; +# Waiting for server_3 Gtid_State_Ack to reflect first transaction +# Let server_3 ACK the second transaction +connection server_3; +set debug_sync= "now SIGNAL reply_ack_to_master"; +include/sync_with_master_gtid.inc +connection server_1; +# Waiting for Gtid_State_Ack to reflect second transaction for both servers +# +# 21322.11.d +connection server_1; +FLUSH LOGS; +insert into t1 values (10); +include/save_master_gtid.inc +# Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +connection server_2; +set debug_sync= "now WAIT_FOR at_slave_reply"; +connection server_3; +set debug_sync= "now WAIT_FOR at_slave_reply"; +connection server_1; +# server_2 and 3 should both show ACKed TRX2 (with TRX3 Sent) +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 TRX3_GTID TRX2_GTID Semi-sync Active +2 localhost SLAVE_PORT 1 TRX3_GTID TRX2_GTID Semi-sync Active +include/wait_for_purge.inc "master-bin.000002" +# Master should warn that the binary log which contains the last ACKed +# binlog coordinates has been purged, and clear Gtid_State_Ack +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 TRX3_GTID Semi-sync Active +2 localhost SLAVE_PORT 1 TRX3_GTID Semi-sync Active +Warnings: +Warning 4200 Error constructing GTID state for binlog position TRX2_BINLOG_POS in file 'TRX2_BINLOG_FILE': Could not find binary log file. Probably the slave state is too old and required binlog files have been purged. +Warning 4200 Error constructing GTID state for binlog position TRX2_BINLOG_POS in file 'TRX2_BINLOG_FILE': Could not find binary log file. Probably the slave state is too old and required binlog files have been purged. +# Let servers ACK new transaction +connection server_2; +set debug_sync= "now SIGNAL reply_ack_to_master"; +connection server_3; +set debug_sync= "now SIGNAL reply_ack_to_master"; +connection server_1; +# Waiting for Gtid_State_Ack to reflect latest transaction for all replicas.. +# Gtid_State_Ack should now show the latest transaction GTID +SHOW REPLICA HOSTS; +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status +3 slave2 SLAVE2_PORT 1 TRX3_GTID TRX3_GTID Semi-sync Active +2 localhost SLAVE_PORT 1 TRX3_GTID TRX3_GTID Semi-sync Active +connection server_2; +include/sync_with_master_gtid.inc +connection server_3; +include/sync_with_master_gtid.inc +# +# MDEV-21322 Cleanup +connection server_1; +set @@global.debug_dbug= @save_primary_dbug; +set @@global.rpl_semi_sync_master_timeout= @save_semisync_timeout; +set @@global.rpl_semi_sync_master_enabled= @save_semisync_master_enabled; +drop table t1; +include/save_master_gtid.inc +connection server_2; +include/sync_with_master_gtid.inc +include/stop_slave.inc +set @@global.rpl_semi_sync_slave_enabled= @save_semisync_server_2_enabled; +SET @@GLOBAL.debug_dbug= ""; +SET debug_sync='RESET'; +include/start_slave.inc +connection server_3; +include/sync_with_master_gtid.inc +include/stop_slave.inc +set @@global.rpl_semi_sync_slave_enabled= @save_semisync_server_3_enabled; +SET @@GLOBAL.debug_dbug= ""; +SET debug_sync='RESET'; +include/start_slave.inc +# +# End of MDEV-21322 tests +# include/rpl_end.inc diff --git a/mysql-test/suite/rpl/t/rpl_show_slave_hosts.cnf b/mysql-test/suite/rpl/t/rpl_show_slave_hosts.cnf index 288f0132fba..43f9f1a294d 100644 --- a/mysql-test/suite/rpl/t/rpl_show_slave_hosts.cnf +++ b/mysql-test/suite/rpl/t/rpl_show_slave_hosts.cnf @@ -2,19 +2,24 @@ [mysqld.1] server_id=1 +log_warnings=9 [mysqld.2] server_id=2 report-host= report-user= +log_slave_updates=1 [mysqld.3] server_id=3 report-host=slave2 slave-net-timeout=5 +log_slave_updates=1 +log_bin=slave2 [ENV] -SLAVE_MYPORT2= @mysqld.3.port -SLAVE_MYSOCK2= @mysqld.3.socket +SERVER_MYPORT_1= @mysqld.1.port +SERVER_MYPORT_2= @mysqld.2.port +SERVER_MYPORT_3= @mysqld.3.port diff --git a/mysql-test/suite/rpl/t/rpl_show_slave_hosts.test b/mysql-test/suite/rpl/t/rpl_show_slave_hosts.test index afac298495f..a63c1662b81 100644 --- a/mysql-test/suite/rpl/t/rpl_show_slave_hosts.test +++ b/mysql-test/suite/rpl/t/rpl_show_slave_hosts.test @@ -9,17 +9,10 @@ # Remove the "Rpl_recovery_rank" column from SHOW SLAVE HOSTS, It is not # implemented. ####################################################################### -source include/master-slave.inc; -connect (slave2,127.0.0.1,root,,test,$SLAVE_MYPORT2,); +--let $rpl_topology= 1->2,1->3 +--source include/rpl_init.inc -connection slave2; -RESET SLAVE; ---replace_result $MASTER_MYPORT MASTER_PORT ---eval CHANGE MASTER TO master_host='127.0.0.1',master_port=$MASTER_MYPORT,master_user='root', master_ssl_verify_server_cert=0 -START SLAVE IO_THREAD; -source include/wait_for_slave_io_to_start.inc; - -connection master; +connection server_1; let $show_statement= SHOW SLAVE HOSTS; let $field= Server_id; # 3 is server_id of slave2. @@ -30,14 +23,13 @@ source include/wait_show_condition.inc; # HOSTS, when that slave is much slower to register due to thread scheduling. let $condition= ='2'; source include/wait_show_condition.inc; ---replace_column 3 'SLAVE_PORT' ---replace_result $SLAVE_MYPORT SLAVE_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT SHOW SLAVE HOSTS; -connection slave2; +connection server_3; --source include/stop_slave_io.inc -connection master; +connection server_1; let $show_statement= SHOW SLAVE HOSTS; let $field= Server_id; # 3 is server_id of slave2. @@ -49,4 +41,792 @@ source include/wait_show_condition.inc; --replace_result $SLAVE_MYPORT SLAVE_PORT SHOW SLAVE HOSTS; + +--echo # +--echo # MDEV-21322: report slave progress to the primary +--echo # + +--echo # +--echo # 21322.0: Test case set-up +--echo # +connection server_1; +set sql_log_bin=0; +call mtr.add_suppression("Got an error reading communication packets"); +call mtr.add_suppression("Semi-sync master failed on net_flush"); +call mtr.add_suppression("Could not read packet:.* vio_errno: 1158"); +call mtr.add_suppression("Could not write packet:.* vio_errno: 1160"); +set sql_log_bin=1; +set @save_primary_dbug= @@global.debug_dbug; +set @save_semisync_timeout= @@global.rpl_semi_sync_master_timeout; +set @save_semisync_master_enabled= @@global.rpl_semi_sync_master_enabled; +create table t1 (a int); + +connection server_2; +set @save_s2_debug= @@GLOBAL.debug_dbug; +set @save_semisync_server_2_enabled= @@global.rpl_semi_sync_master_enabled; +set sql_log_bin=0; +call mtr.add_suppression('Slave I/O: Relay log write failure: could not queue event from master.*'); +call mtr.add_suppression('Slave I/O: Replication event checksum verification failed while reading from network.*'); +call mtr.add_suppression('Replication event checksum verification failed'); +call mtr.add_suppression("Timeout waiting for reply of binlog*"); +call mtr.add_suppression('Found invalid event in binary log'); +call mtr.add_suppression('event read from binlog did not pass crc check'); +call mtr.add_suppression('Event crc check failed! Most likely there is event corruption'); +call mtr.add_suppression('Slave SQL: Error initializing relay log position: I/O error reading event at position .*, error.* 1593'); +call mtr.add_suppression("Semi-sync slave .* reply"); +set sql_log_bin=1; + +connection server_3; +set @save_s3_debug= @@GLOBAL.debug_dbug; +set @save_semisync_server_3_enabled= @@global.rpl_semi_sync_master_enabled; +set sql_log_bin=0; +call mtr.add_suppression('Slave I/O: Relay log write failure: could not queue event from master.*'); +call mtr.add_suppression('Slave I/O: Replication event checksum verification failed while reading from network.*'); +call mtr.add_suppression('Replication event checksum verification failed'); +call mtr.add_suppression("Timeout waiting for reply of binlog*"); +call mtr.add_suppression('Found invalid event in binary log'); +call mtr.add_suppression('event read from binlog did not pass crc check'); +call mtr.add_suppression('Event crc check failed! Most likely there is event corruption'); +call mtr.add_suppression('Slave SQL: Error initializing relay log position: I/O error reading event at position .*, error.* 1593'); +call mtr.add_suppression("Semi-sync slave .* reply"); +set sql_log_bin=1; + +--connection server_1 +--source include/save_master_gtid.inc +--connection server_2 +--source include/sync_with_master_gtid.inc + + +--echo # +--echo # 21322.1: In a fresh replication state with semi-sync disabled, +--echo # the Sync_Status column should reflect an asynchronous replication +--echo # state, and Gtid_State_Sent/Ack should start, and only Gtid_State_Sent +--echo # should update with new transaction. Note only server_2 is currently +--echo # connected. +--echo # +--connection server_1 +--replace_result $SLAVE_MYPORT SLAVE_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW SLAVE HOSTS; +insert into t1 values (1); +--source include/save_master_gtid.inc +--connection server_2 +--source include/sync_with_master_gtid.inc + +--connection server_1 +--echo # Gtid_State_Sent should be updated for new transaction +--replace_result $SLAVE_MYPORT SLAVE_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW SLAVE HOSTS; +--let $master_gtid= `SELECT @@gtid_binlog_pos` +--let $gtid_sent= query_get_value(show slave hosts, Gtid_State_Sent, 1) +if (`SELECT strcmp("$master_gtid","$gtid_sent") != 0`) +{ + --echo # Master gtid_binlog_pos: $master_gtid + --echo # Gtid_State_Sent: $gtid_sent + --die Master did not update Gtid_State_Sent for asynchronous replica +} + + +--echo # +--echo # 21322.2: When only the primary enables semi-sync, Sync_Status should +--echo # still be asynchronous +--echo # +--connection server_1 +set global rpl_semi_sync_master_enabled= 1; +show variables like 'rpl_semi_sync_master_enabled'; +--replace_result $SLAVE_MYPORT SLAVE_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW SLAVE HOSTS; + + +--echo # +--echo # 21322.3: Finalizing the semi-sync connection on server_2 (i.e. by +--echo # enabling it on the slave) should update Sync_Status to semi-sync +--echo # active, as the slave is up-to-date. +--echo # + +--connection server_2 +--source include/stop_slave.inc +set global rpl_semi_sync_slave_enabled = 1; +--source include/start_slave.inc + +--connection server_1 +let $status_var= Rpl_semi_sync_master_clients; +let $status_var_value= 1; +source include/wait_for_status_var.inc; + +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + + +--echo # +--echo # 21322.4: After new semi-sync transactions are ACKed, +--echo # Gtid_State_Sent/Ack should match gtid_binlog_pos, and Sync_Status +--echo # should read that semi-sync is active +--echo # +--connection server_1 +--let $nextval= `SELECT max(a)+1 from t1` +--eval insert into t1 values ($nextval) +--source include/save_master_gtid.inc +--connection server_2 +--source include/sync_with_master_gtid.inc + +--connection server_1 +--let $master_gtid= `SELECT @@gtid_binlog_pos` + +--echo # Ensuring master gtid_binlog_pos matches Gtid_State_Sent +--let $gtid_sent= query_get_value(show slave hosts, Gtid_State_Sent, 1) +if (`SELECT strcmp("$master_gtid","$gtid_sent") != 0`) +{ + --echo # Master gtid_binlog_pos: $master_gtid + --echo # Gtid_State_Sent: $gtid_sent + --die Master's gtid_binlog_pos should match Gtid_State_Sent, but doesn't +} + +--echo # Ensuring master gtid_binlog_pos matches Gtid_State_Ack +--let $gtid_ack= query_get_value(show slave hosts, Gtid_State_Ack, 1) +if (`SELECT strcmp("$master_gtid","$gtid_ack") != 0`) +{ + --echo # Master gtid_binlog_pos: $master_gtid + --echo # Gtid_State_Ack: $gtid_ack + --die Master's gtid_binlog_pos should match Gtid_State_Ack, but doesn't +} + +--echo # Ensuring Sync_Status is semi-sync active +--let $sync_status= query_get_value(show slave hosts, Sync_Status, 1) +if (`SELECT strcmp("$sync_status","semi-sync active") != 0`) +{ + --echo # Sync_Status: $sync_status + --die Incorrect value for Sync_Status, should be "semi-sync active" +} + +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + + +--echo # +--echo # 21322.5: When connecting a new slave (server_id 3) which initially has +--echo # semi-sync disabled, SHOW SLAVE HOSTS on the master should show its +--echo # Sync_Status as asynchronous (while server_id 2 is still semi-sync +--echo # active). +--echo # +# Iniital replication state on server_3 is off +connection server_3; +--source include/start_slave.inc +--source include/sync_with_master_gtid.inc + +connection server_1; +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + +--echo # +--echo # 21322.6: Reconnecting server_3 as a semi-sync enabled replica should +--echo # result in a Sync_Status reflecting active semi-sync +--echo # + +connection server_3; +--source include/stop_slave.inc +set global rpl_semi_sync_slave_enabled = 1; +--source include/start_slave.inc + +connection server_1; +let $status_var= Rpl_semi_sync_master_clients; +let $status_var_value= 2; +source include/wait_for_status_var.inc; +show status like 'Rpl_semi_sync_master_clients'; +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + + +--echo # +--echo # 21322.7: New transactions on the master should update Gtid_State_Sent +--echo # when sent to the slave, and Gtid_State_Ack once receiving an ACK +--echo # + +--connection server_2 +let $server_2_sent_ack= query_get_value(show status like 'Rpl_semi_sync_slave_send_ack', Value, 1); + +--connection server_1 +SET @@GLOBAL.debug_dbug="+d,pause_ack_thread_on_next_ack"; + +# Write the new event +--connection default +--let $old_binlog_gtid= `SELECT @@gtid_binlog_pos` +--let $nextval= `SELECT max(a)+1 from t1` +--send_eval insert into t1 values ($nextval) + +--connection server_1 +--echo # waiting for pause_ack_reply_to_binlog +SET debug_sync='now WAIT_FOR pause_ack_reply_to_binlog'; +--let $new_binlog_gtid= `SELECT @@gtid_binlog_pos` + +--echo # Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Sent; +let $condition= LIKE '$new_binlog_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--echo # Ensuring Gtid_State_Ack is not yet updated (as ACK thread is paused) +--let $gtid_ack= query_get_value(show slave hosts, Gtid_State_Ack, 1) +if (`SELECT strcmp("$old_master_gtid","$gtid_ack") != 0`) +{ + --echo # Master gtid_binlog_pos: $master_gtid + --echo # Gtid_State_Ack: $gtid_ack + --die Gtid_State_Ack should not yet reflect the GTID of the new transaction +} + +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + +--connection server_1 +SET debug_sync='now SIGNAL resume_ack_thread'; + +--connection default +--reap +--connection server_1 + +--echo # Waiting for Gtid_State_Ack to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= LIKE '$new_binlog_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + +--echo # Reset debug state +SET @@GLOBAL.debug_dbug= @save_primary_dbug; +SET debug_sync='RESET'; + + +--echo # +--echo # 21322.8: Holding one replica's ACK (server_2) should result in +--echo # Gtid_State_Ack of server_3 updating to the most recent GTID, while +--echo # server_id 2 has the old GTID. Note that we need to use debug_sync to +--echo # synchronize the ACKs of both server_2 and server_3, so server_3 can't +--echo # ACK the transaction before server_2's binlog dump thread sends the +--echo # transaction (which would negate the need for server_2 to ACK at all, +--echo # resulting in MTR hanging on its expected debug_sync WAIT_FOR point.) +--echo # + +--connection server_2 +--source include/stop_slave.inc +set @@GLOBAL.debug_dbug="+d,synchronize_semisync_slave_reply"; +--source include/start_slave.inc +--connection server_3 +--source include/stop_slave.inc +set @@GLOBAL.debug_dbug="+d,synchronize_semisync_slave_reply"; +--source include/start_slave.inc + +--connection server_1 +--echo # Waiting for master to recognize slave restart.. +let $status_var= Rpl_semi_sync_master_clients; +let $status_var_value= 2; +source include/wait_for_status_var.inc; + +--let $nextval= `SELECT max(a)+1 from t1` +--send_eval insert into t1 values ($nextval) + +--connection server_2 +set debug_sync= "now WAIT_FOR at_slave_reply"; +--connection server_3 +set debug_sync= "now WAIT_FOR at_slave_reply"; + +# New transaction is binlogged at this point, so we can query gtid_binlog_pos +--connection default +--let $binlog_gtid= `SELECT @@gtid_binlog_pos` + +--echo # Ensure Gtid_State_Sent reflects latest transaction ($binlog_gtid) for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Sent; +let $condition= LIKE '$binlog_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--connection server_3 +set debug_sync= "now SIGNAL reply_ack_to_master"; + +--connection server_1 +--reap + +--echo # Wait for Gtid_State_Ack to show the latest transaction for server_3.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= LIKE '$binlog_gtid'; +let $wait_for_all= 0; +source include/wait_show_condition.inc; + +--echo # Only server_3 should have ACKed the new GTID, server_2 should not due to debug_sync holding off the ACK +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + +--connection server_2 +--echo # Resume slave so it can ACK the transaction +set debug_sync= "now SIGNAL reply_ack_to_master"; + +--echo # Waiting for Gtid_State_Ack to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= LIKE '$binlog_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--connection server_1 +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + +--connection server_2 +--source include/stop_slave.inc +set @@GLOBAL.debug_dbug= @save_s2_debug; +SET debug_sync='RESET'; +--source include/start_slave.inc + +--connection server_3 +--source include/stop_slave.inc +set @@GLOBAL.debug_dbug= @save_s3_debug; +SET debug_sync='RESET'; +--source include/start_slave.inc + +--connection server_1 +--source include/save_master_gtid.inc +--connection server_2 +--source include/sync_with_master_gtid.inc +--connection server_3 +--source include/sync_with_master_gtid.inc + + +--echo # +--echo # 21322.9: If a server is behind when connecting to a primary (i.e. the +--echo # primary has newer transactions), the replica Sync_Status should +--echo # present as semi-sync stale and the Gtid_State_Ack should not populate +--echo # until it has reached Semi-Sync Active +--echo # + +--connection server_2 +--source include/stop_slave.inc + +--connection server_1 +--let $nextval= `SELECT max(a)+1 from t1` +--eval insert into t1 values ($nextval) +--source include/save_master_gtid.inc +--let $binlog_gtid= `SELECT @@gtid_binlog_pos` +--echo # Pause dump_thread of server_2 (server_3 won't be affected as it has +--echo # already successfully ACKed the new transaction) +set @@global.debug_dbug= "+d,pause_dump_thread_after_sending_next_full_trx"; + +--connection server_2 +--source include/start_slave.inc + +--connection server_1 +set debug_sync= 'now WAIT_FOR dump_thread_paused'; + +--echo # Ensure Gtid_State_Sent is updated to represent new transaction has +--echo # been sent to both replicas +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Sent; +let $condition= LIKE '$binlog_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--echo # Ensure Sync_Status is Semi-sync Stale for the debug_sync held dump +--echo # thread, as it hasn't yet got "up-to-date" +let $show_statement= SHOW REPLICA HOSTS; +let $field= Sync_Status; +let $condition= LIKE 'Semi-sync Stale'; +let $wait_for_all= 0; +source include/wait_show_condition.inc; + +set debug_sync= 'now SIGNAL dump_thread_continue'; + +--echo # Ensure Sync_Status will automatically update to Semi-sync Active +--echo # once the last stale transaction has finished sending +let $show_statement= SHOW REPLICA HOSTS; +let $field= Sync_Status; +let $condition= LIKE 'Semi-sync Active'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +set @@global.debug_dbug= @save_primary_dbug; +SET debug_sync='RESET'; + +--source include/save_master_gtid.inc +--connection server_2 +--source include/sync_with_master_gtid.inc +--connection server_3 +--source include/sync_with_master_gtid.inc + + +--echo # +--echo # 21322.10a: If one replica errors (i.e. server_2 by injecting +--echo # corrupt_gtid_event with debug_dbug), it shouldn't send its ACK, +--echo # and server_3 does ACK; then Gtid_State_Sent should still reflect the +--echo # new transaction for each replica, but Gtid_State_Ack should only be +--echo # updated by the successful transaction. When the errored replica +--echo # reconnects, it shouldn't ACK the transaction, so its on-reconnect +--echo # Gtid_State_Ack value should be empty, but still have a Sync_Status +--echo # of "Semi-sync Active" when it receives the latest transaction, as it +--echo # will be ready to ACK new transactions going forward. +--echo # +--connection server_2 +--source include/stop_slave.inc +SET @@GLOBAL.debug_dbug= "+d,corrupt_gtid_event"; +--source include/start_slave.inc + +--echo # Set-up server_3 for 10b (so we don't have to restart the slave) +--connection server_3 +--source include/stop_slave.inc +set @@GLOBAL.debug_dbug="+d,synchronize_semisync_slave_reply"; +--source include/start_slave.inc + +--connection server_1 +--echo # Waiting for master to recognize slave restarts.. +let $status_var= Rpl_semi_sync_master_clients; +let $status_var_value= 2; +source include/wait_for_status_var.inc; + +--connection server_1 +--let $nextval= `SELECT max(a)+1 from t1` +--send_eval insert into t1 values ($nextval) + +--echo # Debug_sync is irrelevant to this testcase (10a) but we must do it to +--echo # allow server_3 to ACK now (Note debug_sync is needed for 10b) +--connection server_3 +set debug_sync= "now WAIT_FOR at_slave_reply"; +set debug_sync= "now SIGNAL reply_ack_to_master"; + +--connection server_1 +--reap +--let $binlog_gtid= `SELECT @@gtid_binlog_pos` +--source include/save_master_gtid.inc + +--echo # Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Sent; +let $condition= LIKE '$binlog_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--connection server_2 +--let $slave_io_errno= 1595 +--source include/wait_for_slave_io_error.inc +set @@GLOBAL.debug_dbug= @save_s2_debug; + +--connection server_1 +--echo # Only server_3 should ACKed have the new GTID, server_2 should not due to corrupt_queue_event +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + +--connection server_2 +--source include/start_slave.inc +--source include/sync_with_master_gtid.inc + +--connection server_1 +--echo # With replica restarted/synced, its Gtid_State_Ack should be empty with Sync_Status semi-sync active +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + + +--echo # +--echo # 21322.10b: Succeeding the previous 10a test, if server_3 now stalls +--echo # (i.e. using debug_sync), then the previously errored server_2 should +--echo # receive and ACK new transactions as a "lone" replica, and update its +--echo # Gtid_State_* columns appropriately, whereas server_3's Gtid_State_Ack +--echo # column should not be updated. +--echo # + +--let $old_binlog_gtid= `SELECT @@gtid_binlog_pos` +--let $nextval= `SELECT max(a)+1 from t1` +--eval insert into t1 values ($nextval) +--let $new_binlog_gtid= `SELECT @@gtid_binlog_pos` +--source include/save_master_gtid.inc + +--connection server_2 +--source include/sync_with_master_gtid.inc + +--echo # Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Sent; +let $condition= LIKE '$new_binlog_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--connection server_3 +set debug_sync= "now WAIT_FOR at_slave_reply"; + +--connection server_1 +--echo # Only server_2 should have ACKed the new GTID ACKed, server_3 should not due to stall +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT +SHOW REPLICA HOSTS; + +--echo # Resume server_3 +--connection server_3 +set debug_sync= "now SIGNAL reply_ack_to_master"; +--source include/sync_with_master_gtid.inc +SET debug_sync='RESET'; + +--source include/stop_slave.inc +set @@GLOBAL.debug_dbug= @save_s3_debug; +SET debug_sync='RESET'; +--source include/start_slave.inc + + +--echo # +--echo # 21322.11: Configuration rpl_semi_sync_master_timeout of 0 should +--echo # have transaction behavior match asynchronous behavior (i.e. trxs don't +--echo # need to wait for ACKs), yet the slave should still send ACKs as a +--echo # normal semi-sync replica, and Gtid_State_Ack should still be updated +--echo # accordingly. +--echo # +--echo # The actual behavior tested in this case is as follows: +--echo # a) Transactions won't await ACKs to complete +--echo # b) Semi-sync remains ON when a transaction completes without an ACK +--echo # c) Gtid_State_Ack is updated accordingly for each replica's ACK (even +--echo # when it is behind). Here, we hold both replicas using DEBUG_SYNC to +--echo # not send their ACKs, meanwhile, we continue creating transactions +--echo # on the primary. +--echo # d) If a "very lagged" replica sends an ACK for a transaction from a +--echo # purged binlog, the Gtid_State_Ack value should be cleared, and +--echo # issue a warning to the user with the slave's last ACKed binlog +--echo # coordinate (i.e. filename and position). +--echo # + +--connection server_1 +set global rpl_semi_sync_master_timeout=0; + +--connection server_2 +--source include/stop_slave.inc +SET @@GLOBAL.DEBUG_DBUG="+d,synchronize_semisync_slave_reply"; +--source include/start_slave.inc +--connection server_3 +--source include/stop_slave.inc +SET @@GLOBAL.DEBUG_DBUG="+d,synchronize_semisync_slave_reply"; +--source include/start_slave.inc + +--connection server_1 +--echo # Waiting for master to recognize slave restarts.. +let $status_var= Rpl_semi_sync_master_clients; +let $status_var_value= 2; +source include/wait_for_status_var.inc; + +--echo # +--echo # 21322.11.a +connection server_1; +--let $yes_tx_orig= query_get_value(show status like 'Rpl_semi_sync_master_yes_tx', Value, 1) +--let $no_tx_orig= query_get_value(show status like 'Rpl_semi_sync_master_no_tx', Value, 1) +--let $nextval= `SELECT max(a)+1 from t1` +--eval insert into t1 values ($nextval) +--let $trx1_gtid= `SELECT @@gtid_binlog_pos` + +--echo # Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Sent; +let $condition= LIKE '$trx1_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--connection server_2 +SET debug_sync='now WAIT_FOR at_slave_reply'; +--connection server_3 +SET debug_sync='now WAIT_FOR at_slave_reply'; + +--echo # Gtid_State_Ack should be empty for both replicas (as they were restarted).. +--let $binlog_gtid= `SELECT @@gtid_binlog_pos` +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= = ''; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--echo # +--echo # 21322.11.b +--connection server_1 +--echo # Ensuring semi-sync status on primary is correct.. +--let $yes_tx_post_commit= query_get_value(show status like 'Rpl_semi_sync_master_yes_tx', Value, 1) +if ($yes_tx_orig != $yes_tx_post_commit) +{ + --echo # yes_tx original: $yes_tx_orig + --echo # yes_tx after commit: $yes_tx_post_commit + --die Rpl_semi_sync_master_yes_tx should not have changed when Rpl_semi_sync_master_timeout is 0 +} +--let $no_tx_post_commit= query_get_value(show status like 'Rpl_semi_sync_master_no_tx', Value, 1) +if ($no_tx_orig != $no_tx_post_commit) +{ + --echo # no_tx original: $no_tx_orig + --echo # no_tx after commit: $no_tx_post_commit + --die Rpl_semi_sync_master_no_tx should not have changed when Rpl_semi_sync_master_timeout is 0 +} +--let $master_semisync_status= query_get_value(show status like 'Rpl_semi_sync_master_status', Value, 1) +if (`SELECT strcmp('$master_semisync_status','ON') != 0`) +{ + --die Master semi-sync status was disabled after transaction +} + +--echo # +--echo # 21322.11.c +--connection server_1 +--let $nextval= `SELECT max(a)+1 from t1` +--eval insert into t1 values ($nextval) +--let $trx2_gtid= `SELECT @@gtid_binlog_pos` +--source include/save_master_gtid.inc + +--echo # Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Sent; +let $condition= LIKE '$trx2_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--connection server_1 +--echo # server_2 and 3 should both show an empty ACK state +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT $trx1_gtid TRX1_GTID $trx2_gtid TRX2_GTID +SHOW REPLICA HOSTS; + +--echo # Let server_2 ACK just the first transaction +--connection server_2 +set debug_sync= "now SIGNAL reply_ack_to_master"; +set debug_sync= "now WAIT_FOR at_slave_reply"; + +--connection server_1 +--echo # Waiting for server_2 Gtid_State_Ack to reflect first transaction +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= LIKE '$trx1_gtid'; +let $wait_for_all= 0; +source include/wait_show_condition.inc; + +--echo # Let server_2 ACK the second transaction +--connection server_2 +set debug_sync= "now SIGNAL reply_ack_to_master"; +--source include/sync_with_master_gtid.inc + +--connection server_1 +--echo # Waiting for server_2 Gtid_State_Ack to reflect second transaction +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= LIKE '$trx2_gtid'; +let $wait_for_all= 0; +source include/wait_show_condition.inc; + +--echo # Let server_3 now ACK the first transaction +--connection server_3 +set debug_sync= "now SIGNAL reply_ack_to_master"; +set debug_sync= "now WAIT_FOR at_slave_reply"; + +--connection server_1 +--echo # Waiting for server_3 Gtid_State_Ack to reflect first transaction +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= LIKE '$trx1_gtid'; +let $wait_for_all= 0; +source include/wait_show_condition.inc; + +--echo # Let server_3 ACK the second transaction +--connection server_3 +set debug_sync= "now SIGNAL reply_ack_to_master"; +--source include/sync_with_master_gtid.inc + +--connection server_1 +--echo # Waiting for Gtid_State_Ack to reflect second transaction for both servers +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= LIKE '$trx2_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + + +--echo # +--echo # 21322.11.d +--connection server_1 +--let $trx2_binlog_file= query_get_value(SHOW BINARY LOGS, Log_name, 1) +--let $trx2_binlog_pos= query_get_value(SHOW BINARY LOGS, File_size, 1) +FLUSH LOGS; +--let $nextval= `SELECT max(a)+1 from t1` +--eval insert into t1 values ($nextval) +--let $trx3_gtid= `SELECT @@gtid_binlog_pos` +--source include/save_master_gtid.inc + +--echo # Waiting for Gtid_State_Sent to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Sent; +let $condition= LIKE '$trx3_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--connection server_2 +set debug_sync= "now WAIT_FOR at_slave_reply"; +--connection server_3 +set debug_sync= "now WAIT_FOR at_slave_reply"; + +--connection server_1 +--echo # server_2 and 3 should both show ACKed TRX2 (with TRX3 Sent) +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT $trx2_gtid TRX2_GTID $trx3_gtid TRX3_GTID +SHOW REPLICA HOSTS; + +--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1) +--let $purge_binlogs_to=$purge_to_binlog +--source include/wait_for_purge.inc + +--echo # Master should warn that the binary log which contains the last ACKed +--echo # binlog coordinates has been purged, and clear Gtid_State_Ack +--enable_warnings +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT $trx2_gtid TRX2_GTID $trx3_gtid TRX3_GTID $trx2_binlog_file TRX2_BINLOG_FILE $trx2_binlog_pos TRX2_BINLOG_POS +SHOW REPLICA HOSTS; + +--echo # Let servers ACK new transaction +--connection server_2 +set debug_sync= "now SIGNAL reply_ack_to_master"; +--connection server_3 +set debug_sync= "now SIGNAL reply_ack_to_master"; + +--connection server_1 +--echo # Waiting for Gtid_State_Ack to reflect latest transaction for all replicas.. +let $show_statement= SHOW REPLICA HOSTS; +let $field= Gtid_State_Ack; +let $condition= LIKE '$trx3_gtid'; +let $wait_for_all= 1; +source include/wait_show_condition.inc; + +--echo # Gtid_State_Ack should now show the latest transaction GTID +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT $trx2_gtid TRX2_GTID $trx3_gtid TRX3_GTID +SHOW REPLICA HOSTS; +--disable_warnings + +--connection server_2 +--source include/sync_with_master_gtid.inc +--connection server_3 +--source include/sync_with_master_gtid.inc + + +--echo # +--echo # MDEV-21322 Cleanup +connection server_1; +set @@global.debug_dbug= @save_primary_dbug; +set @@global.rpl_semi_sync_master_timeout= @save_semisync_timeout; +set @@global.rpl_semi_sync_master_enabled= @save_semisync_master_enabled; + +drop table t1; +--source include/save_master_gtid.inc +--save_master_pos + +connection server_2; +--source include/sync_with_master_gtid.inc +--source include/stop_slave.inc +set @@global.rpl_semi_sync_slave_enabled= @save_semisync_server_2_enabled; +--eval SET @@GLOBAL.debug_dbug= "$save_server_2_dbug" +SET debug_sync='RESET'; +--source include/start_slave.inc + +connection server_3; +--source include/sync_with_master_gtid.inc +--source include/stop_slave.inc +set @@global.rpl_semi_sync_slave_enabled= @save_semisync_server_3_enabled; +--eval SET @@GLOBAL.debug_dbug= "$save_server_3_dbug" +SET debug_sync='RESET'; +--source include/start_slave.inc + +--echo # +--echo # End of MDEV-21322 tests +--echo # + +# End of rpl_show_slave_hosts.test --source include/rpl_end.inc diff --git a/sql/repl_failsafe.cc b/sql/repl_failsafe.cc index d0285b54928..67a12cee80d 100644 --- a/sql/repl_failsafe.cc +++ b/sql/repl_failsafe.cc @@ -37,18 +37,7 @@ #include "rpl_filter.h" #include "log_event.h" #include <mysql.h> - - -struct Slave_info -{ - uint32 server_id; - uint32 master_id; - char host[HOSTNAME_LENGTH*SYSTEM_CHARSET_MBMAXLEN+1]; - char user[USERNAME_LENGTH+1]; - char password[MAX_PASSWORD_LENGTH*SYSTEM_CHARSET_MBMAXLEN+1]; - uint16 port; -}; - +#include "semisync_master.h" Atomic_counter<uint32_t> binlog_dump_thread_count; ulong rpl_status=RPL_NULL; @@ -125,8 +114,11 @@ int THD::register_slave(uchar *packet, size_t packet_length) if (check_access(this, PRIV_COM_REGISTER_SLAVE, any_db.str, NULL,NULL,0,0)) return 1; if (!(si= (Slave_info*)my_malloc(key_memory_SLAVE_INFO, sizeof(Slave_info), - MYF(MY_WME)))) + MYF(MY_WME|MY_ZEROFILL)))) return 1; + memset(si->gtid_state_sent.log_file, '\0', FN_REFLEN); + memset(si->gtid_state_ack.log_file, '\0', FN_REFLEN); + si->sync_status= Slave_info::SYNC_STATE_INITIALIZING; variables.server_id= si->server_id= uint4korr(p); p+= 4; @@ -179,7 +171,10 @@ static my_bool show_slave_hosts_callback(THD *thd, Protocol *protocol) { my_bool res= FALSE; mysql_mutex_lock(&thd->LOCK_thd_data); - if (auto si= thd->slave_info) + String gtid_sent, gtid_ack; + const char *sync_str; + const char *err_msg= NULL; + if (const Slave_info *si= thd->slave_info) { protocol->prepare_for_resend(); protocol->store(si->server_id); @@ -191,6 +186,50 @@ static my_bool show_slave_hosts_callback(THD *thd, Protocol *protocol) } protocol->store((uint32) si->port); protocol->store(si->master_id); + + if (gtid_state_from_binlog_pos(si->gtid_state_sent.log_file, + (uint32) si->gtid_state_sent.log_pos, + >id_sent, &err_msg)) + { + gtid_sent.length(0); + DBUG_ASSERT(err_msg); + if (global_system_variables.log_warnings >= 2) + push_warning_printf( + current_thd, Sql_condition::WARN_LEVEL_WARN, + ER_MASTER_CANNOT_RECONSTRUCT_GTID_STATE_FOR_BINLOG_POS, + ER_THD(current_thd, + ER_MASTER_CANNOT_RECONSTRUCT_GTID_STATE_FOR_BINLOG_POS), + si->gtid_state_sent.log_pos, si->gtid_state_sent.log_file, + err_msg); + } + protocol->store(>id_sent); + + if (rpl_semi_sync_master_enabled && thd->semi_sync_slave) + { + if (gtid_state_from_binlog_pos(si->gtid_state_ack.log_file, + (uint32) si->gtid_state_ack.log_pos, + >id_ack, &err_msg)) + { + gtid_ack.length(0); + DBUG_ASSERT(err_msg); + + if (global_system_variables.log_warnings >= 2) + { + push_warning_printf( + current_thd, Sql_condition::WARN_LEVEL_WARN, + ER_MASTER_CANNOT_RECONSTRUCT_GTID_STATE_FOR_BINLOG_POS, + ER_THD(current_thd, + ER_MASTER_CANNOT_RECONSTRUCT_GTID_STATE_FOR_BINLOG_POS), + si->gtid_state_ack.log_pos, si->gtid_state_ack.log_file, + err_msg); + } + } + } + protocol->store(>id_ack); + + sync_str= si->get_sync_status_str(); + protocol->store(sync_str, safe_strlen(sync_str), &my_charset_bin); + res= protocol->write(); } mysql_mutex_unlock(&thd->LOCK_thd_data); @@ -235,6 +274,20 @@ bool show_slave_hosts(THD* thd) Item_return_int(thd, "Master_id", 10, MYSQL_TYPE_LONG), thd->mem_root); + /* Length matches GTID_IO_Pos of SHOW SLAVE STATUS on slave */ + field_list.push_back(new (mem_root) + Item_empty_string(thd, "Gtid_State_Sent", 30), + thd->mem_root); + + field_list.push_back(new (mem_root) + Item_empty_string(thd, "Gtid_State_Ack", 30), + thd->mem_root); + + /* For the length, use the size of the longest possible value */ + field_list.push_back(new (mem_root) Item_empty_string( + thd, "Sync_Status", sizeof("Semi-sync Active")), + thd->mem_root); + if (protocol->send_result_set_metadata(&field_list, Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF)) DBUG_RETURN(TRUE); diff --git a/sql/semisync_master.cc b/sql/semisync_master.cc index fdf2cf21cf1..8b07c25040e 100644 --- a/sql/semisync_master.cc +++ b/sql/semisync_master.cc @@ -53,13 +53,6 @@ ulonglong rpl_semi_sync_master_trx_wait_time = 0; Repl_semi_sync_master repl_semisync_master; Ack_receiver ack_receiver; -/* - structure to save transaction log filename and position -*/ -typedef struct Trans_binlog_info { - my_off_t log_pos; - char log_file[FN_REFLEN]; -} Trans_binlog_info; static int get_wait_time(const struct timespec& start_ts); @@ -591,7 +584,7 @@ void Repl_semi_sync_master::remove_slave() @retval -1 Slave is going down (ok) */ -int Repl_semi_sync_master::report_reply_packet(uint32 server_id, +int Repl_semi_sync_master::report_reply_packet(Slave_info *slave_info, const uchar *packet, ulong packet_len) { @@ -635,12 +628,12 @@ int Repl_semi_sync_master::report_reply_packet(uint32 server_id, DBUG_ASSERT(dirname_length(log_file_name) == 0); - DBUG_PRINT("semisync", ("%s: Got reply(%s, %lu) from server %u", - "Repl_semi_sync_master::report_reply_packet", - log_file_name, (ulong)log_file_pos, server_id)); - + DBUG_PRINT("semisync", + ("%s: Got reply(%s, %lu) from server %u", + "Repl_semi_sync_master::report_reply_packet", log_file_name, + (ulong) log_file_pos, slave_info->server_id)); rpl_semi_sync_master_get_ack++; - report_reply_binlog(server_id, log_file_name, log_file_pos); + report_reply_binlog(slave_info, log_file_name, log_file_pos); DBUG_RETURN(0); l_end: @@ -649,13 +642,13 @@ int Repl_semi_sync_master::report_reply_packet(uint32 server_id, octet2hex(buf, (const unsigned char*) packet, MY_MIN(sizeof(buf)-1, (size_t) packet_len)); sql_print_information("First bytes of the packet from semisync slave " - "server-id %d: %s", server_id, buf); + "server-id %d: %s", slave_info->server_id, buf); } DBUG_RETURN(result); } -int Repl_semi_sync_master::report_reply_binlog(uint32 server_id, +int Repl_semi_sync_master::report_reply_binlog(Slave_info *slave_info, const char *log_file_name, my_off_t log_file_pos) { @@ -675,7 +668,7 @@ int Repl_semi_sync_master::report_reply_binlog(uint32 server_id, if (!is_on()) /* We check to see whether we can switch semi-sync ON. */ - try_switch_on(server_id, log_file_name, log_file_pos); + try_switch_on(slave_info->server_id, log_file_name, log_file_pos); /* The position should increase monotonically, if there is only one * thread sending the binlog to the slave. @@ -719,6 +712,19 @@ int Repl_semi_sync_master::report_reply_binlog(uint32 server_id, DBUG_PRINT("semisync", ("%s: Got reply at (%s, %lu)", "Repl_semi_sync_master::report_reply_binlog", log_file_name, (ulong)log_file_pos)); + goto update_gtid_state_ack; + } + else if (rpl_semi_sync_master_clients > 1 && + Active_tranx::compare(slave_info->gtid_state_ack.log_file, + slave_info->gtid_state_ack.log_pos, + m_reply_file_name, m_reply_file_pos)) + { +update_gtid_state_ack: + /* + Each slave should still maintain its Gtid_state_ack + */ + strncpy(slave_info->gtid_state_ack.log_file, log_file_name, strlen(log_file_name)); + slave_info->gtid_state_ack.log_pos= log_file_pos; } @@ -828,7 +834,7 @@ int Repl_semi_sync_master::dump_start(THD* thd, } add_slave(); - report_reply_binlog(thd->variables.server_id, + report_reply_binlog(thd->slave_info, log_file + dirname_length(log_file), log_pos); sql_print_information("Start semi-sync binlog_dump to slave " "(server_id: %ld), pos(%s, %lu)", @@ -858,6 +864,16 @@ int Repl_semi_sync_master::commit_trx(const char *trx_wait_binlog_name, bool success= 0; DBUG_ENTER("Repl_semi_sync_master::commit_trx"); + /* + If the semi-sync timeout is set to 0, we effectively are configured for + asynchronous replication; except we still want to request/receive ACKs from + slaves so we can monitor replication status via SHOW SLAVE HOSTS columns + Gtid_State_Sent and Gtid_State_Ack. Thereby, we should quit now before + updating rpl_semi_sync_master_(no/yes)_transactions. + */ + if (!m_wait_timeout) + DBUG_RETURN(0); + if (!rpl_semi_sync_master_clients && !rpl_semi_sync_master_wait_no_slave) { rpl_semi_sync_master_no_transactions++; @@ -1235,6 +1251,14 @@ int Repl_semi_sync_master::update_sync_header(THD* thd, unsigned char *packet, *need_sync= sync; l_end: + if (is_on()) + { + thd->slave_info->sync_status= + sync ? thd->slave_info->sync_status= + Slave_info::SYNC_STATE_SEMI_SYNC_ACTIVE + : thd->slave_info->sync_status= + Slave_info::SYNC_STATE_SEMI_SYNC_STALE; + } unlock(); /* diff --git a/sql/semisync_master.h b/sql/semisync_master.h index 3978d21a61d..cfb5e3d150b 100644 --- a/sql/semisync_master.h +++ b/sql/semisync_master.h @@ -28,6 +28,91 @@ extern PSI_mutex_key key_LOCK_binlog; extern PSI_cond_key key_COND_binlog_send; #endif + +/* + structure to save transaction log filename and position +*/ +typedef struct Trans_binlog_info { + my_off_t log_pos; + char log_file[FN_REFLEN]; +} Trans_binlog_info; + + +struct Slave_info +{ +public: + enum synchronization_status { + /* + Binlog dump thread is initializing, we don't yet know the synchronization + status + */ + SYNC_STATE_INITIALIZING, + + /* + Slave is asynchronous, so Gtid_State_Ack will not be updated + */ + SYNC_STATE_ASYNCHRONOUS, + + /* + Slave is configured for semi-sync, but connected with an old state, and + is catching up now + */ + SYNC_STATE_SEMI_SYNC_STALE, + + /* + Slave is configured for semi-sync, and is readily ACKing new transactions + */ + SYNC_STATE_SEMI_SYNC_ACTIVE + }; + + uint32 server_id; + uint32 master_id; + char host[HOSTNAME_LENGTH*SYSTEM_CHARSET_MBMAXLEN+1]; + char user[USERNAME_LENGTH+1]; + char password[MAX_PASSWORD_LENGTH*SYSTEM_CHARSET_MBMAXLEN+1]; + uint16 port; + + /* + Binlog file:pos of the last transaction sent to this replica. Used to infer + Gtid_State_Sent in SHOW REPLICA HOSTS. Used for both asynchronous and + semi-sync connections. + */ + Trans_binlog_info gtid_state_sent; + + /* + If replica is configured for semi-sync, the binlog file:pos of the last + transaction ACKed by this replica. Used to infer Gtid_State_Ack in + SHOW REPLICA HOSTS. + */ + Trans_binlog_info gtid_state_ack; + + /* + Sync_Status of SHOW REPLICA HOSTS. + */ + synchronization_status sync_status; + + const char *get_sync_status_str() const + { + const char *ret; + switch (sync_status) + { + case SYNC_STATE_INITIALIZING: + ret= "Initializing"; + break; + case SYNC_STATE_ASYNCHRONOUS: + ret= "Asynchronous"; + break; + case SYNC_STATE_SEMI_SYNC_STALE: + ret= "Semi-sync Stale"; + break; + default: + ret= "Semi-sync Active"; + } + return ret; + } +}; + + struct Tranx_node { char log_name[FN_REFLEN]; my_off_t log_pos; @@ -561,14 +646,14 @@ class Repl_semi_sync_master void remove_slave(); /* It parses a reply packet and call report_reply_binlog to handle it. */ - int report_reply_packet(uint32 server_id, const uchar *packet, - ulong packet_len); + int report_reply_packet(Slave_info *slave_info, const uchar *packet, + ulong packet_len); /* In semi-sync replication, reports up to which binlog position we have * received replies from the slave indicating that it already get the events. * * Input: - * server_id - (IN) master server id number + * slave_info - (IN) info of the slave which sent the ACK * log_file_name - (IN) binlog file name * end_offset - (IN) the offset in the binlog file up to which we have * the replies from the slave @@ -576,7 +661,7 @@ class Repl_semi_sync_master * Return: * 0: success; non-zero: error */ - int report_reply_binlog(uint32 server_id, + int report_reply_binlog(Slave_info *slave_info, const char* log_file_name, my_off_t end_offset); diff --git a/sql/semisync_master_ack_receiver.cc b/sql/semisync_master_ack_receiver.cc index 29fa5fd5328..ba2ea0c9762 100644 --- a/sql/semisync_master_ack_receiver.cc +++ b/sql/semisync_master_ack_receiver.cc @@ -16,6 +16,7 @@ #include <my_global.h> #include "semisync_master.h" #include "semisync_master_ack_receiver.h" +#include "debug_sync.h" #ifdef HAVE_PSI_MUTEX_INTERFACE extern PSI_mutex_key key_LOCK_ack_receiver; @@ -353,7 +354,20 @@ void Ack_receiver::run() if (likely(len != packet_error)) { int res; - res= repl_semisync_master.report_reply_packet(slave->server_id(), +#ifdef ENABLED_DEBUG_SYNC + /* + A (+d,pause_ack_thread_on_next_ack)-test is supposed to + be run to check `Gtid_state_ack` in show replica hosts + for cases where there are multiple active replicas. + */ + DBUG_EXECUTE_IF("pause_ack_thread_on_next_ack", + { + const char act[]= "now SIGNAL pause_ack_reply_to_binlog WAIT_FOR resume_ack_thread"; + DBUG_ASSERT(!debug_sync_set_action(thd, STRING_WITH_LEN(act))); + DBUG_SET("-d,pause_ack_thread_on_next_ack"); + };); +#endif + res= repl_semisync_master.report_reply_packet(slave->thd->slave_info, net.read_pos, len); if (unlikely(res < 0)) { diff --git a/sql/share/errmsg-utf8.txt b/sql/share/errmsg-utf8.txt index 22d0f4d3392..a428c1c63f8 100644 --- a/sql/share/errmsg-utf8.txt +++ b/sql/share/errmsg-utf8.txt @@ -12280,3 +12280,5 @@ ER_SEQUENCE_TABLE_CANNOT_HAVE_ANY_CONSTRAINTS eng "Sequence tables cannot have any constraints" ER_SEQUENCE_TABLE_ORDER_BY eng "ORDER BY" +ER_MASTER_CANNOT_RECONSTRUCT_GTID_STATE_FOR_BINLOG_POS + eng "Error constructing GTID state for binlog position %u in file '%s': %s" \ No newline at end of file diff --git a/sql/slave.cc b/sql/slave.cc index 54e70bc7385..8b6d09d28ac 100644 --- a/sql/slave.cc +++ b/sql/slave.cc @@ -5121,6 +5121,13 @@ Stopping slave I/O thread due to out-of-memory error from master"); { DBUG_EXECUTE_IF("simulate_delay_semisync_slave_reply", my_sleep(800000);); +#ifdef ENABLED_DEBUG_SYNC + DBUG_EXECUTE_IF("synchronize_semisync_slave_reply", + { + const char act[]= "now SIGNAL at_slave_reply WAIT_FOR reply_ack_to_master"; + DBUG_ASSERT(!debug_sync_set_action(thd, STRING_WITH_LEN(act))); + };); +#endif if (repl_semisync_slave.slave_reply(mi)) { /* @@ -6120,19 +6127,33 @@ static int queue_event(Master_info* mi, const uchar *buf, ulong event_len) // will have to refine the clause. DBUG_ASSERT(mi->rli.relay_log.relay_log_checksum_alg != BINLOG_CHECKSUM_ALG_UNDEF); - - // Emulate the network corruption - DBUG_EXECUTE_IF("corrupt_queue_event", - if (buf[EVENT_TYPE_OFFSET] != FORMAT_DESCRIPTION_EVENT) - { - uchar *debug_event_buf_c= const_cast<uchar*>(buf); - int debug_cor_pos = rand() % (event_len - BINLOG_CHECKSUM_LEN); - debug_event_buf_c[debug_cor_pos] =~ debug_event_buf_c[debug_cor_pos]; - DBUG_PRINT("info", ("Corrupt the event at queue_event: byte on position %d", debug_cor_pos)); - DBUG_SET("-d,corrupt_queue_event"); - } - ); - + +#ifndef DBUG_OFF + { + const char *dbug_unset; + // Emulate the network corruption + DBUG_EXECUTE_IF( + "corrupt_gtid_event", + if (buf[EVENT_TYPE_OFFSET] == GTID_EVENT) { + dbug_unset= "-d,corrupt_gtid_event"; + goto corrupt_event; + }); + DBUG_EXECUTE_IF( + "corrupt_queue_event", + if (buf[EVENT_TYPE_OFFSET] != FORMAT_DESCRIPTION_EVENT) { + dbug_unset= "-d,corrupt_queue_event"; + corrupt_event: + uchar *debug_event_buf_c= const_cast<uchar *>(buf); + int debug_cor_pos= rand() % (event_len - BINLOG_CHECKSUM_LEN); + debug_event_buf_c[debug_cor_pos]= ~debug_event_buf_c[debug_cor_pos]; + DBUG_PRINT("info", + ("Corrupt the event at queue_event: byte on position %d", + debug_cor_pos)); + DBUG_SET(dbug_unset); + }); + } +#endif + if (event_checksum_test((uchar*) buf, event_len, checksum_alg)) { error= ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE; diff --git a/sql/sql_repl.cc b/sql/sql_repl.cc index 8ca252c37c6..18a52dc07ad 100644 --- a/sql/sql_repl.cc +++ b/sql/sql_repl.cc @@ -1830,18 +1830,26 @@ gtid_state_from_pos(const char *name, uint32 offset, return errormsg; } - -int -gtid_state_from_binlog_pos(const char *in_name, uint32 pos, String *out_str) +int gtid_state_from_binlog_pos(const char *in_name, uint32 pos, + String *out_str, const char **out_err) { slave_connection_state gtid_state; const char *lookup_name; char name_buf[FN_REFLEN]; LOG_INFO linfo; + const char *dummy_err; + const char **err_save; + int find_err= 0; + + if (out_err) + err_save= out_err; + else + err_save= &dummy_err; if (!mysql_bin_log.is_open()) { my_error(ER_NO_BINARY_LOGGING, MYF(0)); + *err_save= "Binary logging is disabled."; return 1; } @@ -1853,15 +1861,27 @@ gtid_state_from_binlog_pos(const char *in_name, uint32 pos, String *out_str) else lookup_name= NULL; linfo.index_file_offset= 0; - if (mysql_bin_log.find_log_pos(&linfo, lookup_name, 1)) + if ((find_err= mysql_bin_log.find_log_pos(&linfo, lookup_name, 1))) + { + if (find_err == LOG_INFO_EOF) + *err_save= "Could not find binary log file. Probably the slave state is " + "too old and required binlog files have been purged."; + else + *err_save= "Error reading index file."; return 1; + } if (pos < 4) pos= 4; - if (gtid_state_from_pos(linfo.log_file_name, pos, >id_state) || + if ((*err_save= + gtid_state_from_pos(linfo.log_file_name, pos, >id_state)) || gtid_state.to_string(out_str)) + { + if (!*err_save) + *err_save= "Failed converting GTID state to string representation."; return 1; + } return 0; } @@ -2272,6 +2292,30 @@ send_event_to_slave(binlog_send_info *info, Log_event_type event_type, return "Failed to run hook 'after_send_event'"; } + if (info->thd->slave_info) + { + strncpy(info->thd->slave_info->gtid_state_sent.log_file, + info->log_file_name + info->dirlen, + strlen(info->log_file_name) - info->dirlen); + info->thd->slave_info->gtid_state_sent.log_pos= pos; + } + +#ifdef ENABLED_DEBUG_SYNC + DBUG_EXECUTE_IF("pause_dump_thread_after_sending_next_full_trx", { + if (event_type == XID_EVENT || + (event_type == QUERY_EVENT && + Query_log_event::peek_is_commit_rollback( + (uchar *) packet->ptr() + ev_offset, len - ev_offset, + current_checksum_alg))) + { + DBUG_ASSERT(!debug_sync_set_action( + info->thd, STRING_WITH_LEN("now SIGNAL dump_thread_paused " + "WAIT_FOR dump_thread_continue"))); + DBUG_SET("-d,pause_dump_thread_after_sending_next_full_trx"); + } + }); +#endif + return NULL; /* Success */ } @@ -2753,6 +2797,10 @@ static int wait_new_events(binlog_send_info *info, /* in */ break; } + if (info->thd->semi_sync_slave) + info->thd->slave_info->sync_status= + Slave_info::SYNC_STATE_SEMI_SYNC_ACTIVE; + if (info->heartbeat_period) { struct timespec ts; @@ -3087,6 +3135,14 @@ void mysql_binlog_send(THD* thd, char* log_ident, my_off_t pos, /* Check if the dump thread is created by a slave with semisync enabled. */ thd->semi_sync_slave = is_semi_sync_slave(); + /* + If the slave is not set up for a semi-sync connection, we can tag it + immediately as asynchronous. Otherwise, we need to wait and see if the + replica is up-to-date or not to mark semi-sync active vs stale. + */ + if (thd->slave_info && !thd->semi_sync_slave) + thd->slave_info->sync_status= Slave_info::SYNC_STATE_ASYNCHRONOUS; + DBUG_ASSERT(pos == linfo.pos); if (repl_semisync_master.dump_start(thd, linfo.log_file_name, linfo.pos)) diff --git a/sql/sql_repl.h b/sql/sql_repl.h index 51b6a599d5f..5e3ccc9df45 100644 --- a/sql/sql_repl.h +++ b/sql/sql_repl.h @@ -66,7 +66,8 @@ void rpl_init_gtid_slave_state(); void rpl_deinit_gtid_slave_state(); void rpl_init_gtid_waiting(); void rpl_deinit_gtid_waiting(); -int gtid_state_from_binlog_pos(const char *name, uint32 pos, String *out_str); +int gtid_state_from_binlog_pos(const char *name, uint32 pos, String *out_str, + const char **out_err= NULL); int rpl_append_gtid_state(String *dest, bool use_binlog); int rpl_load_gtid_state(slave_connection_state *state, bool use_binlog); bool rpl_gtid_pos_check(THD *thd, char *str, size_t len); -- 2.30.2
participants (1)
-
Kristian Nielsen