It was brought to my attention an issue with parallel replication and the Seconds_Behind_Master field of SHOW SLAVE STATUS. I have a possible patch for this, but I wanted to discuss it on the list, as it changes semantics compared to the non-parallel case. Each binlog event contains a timestamp (**) of when the event was created on the master. Whenever the slave SQL thread reads an event from the relay log, it updates the value of Seconds_Behind_Master to the difference between the slave's current time and the event's timestamp. Now in parallel replication, the SQL thread can read a large number of events from the relay log and queue them in-memory for the worker threads. So a small value of Seconds_Behind_Master means only that recent events have been queued - it might still be a long time before the worker threads have had time to actually execute all the queued events. Apparently the problem is (justified) user confusion about this queuing delay not being reflected in Seconds_Behind_Master. The same problem actually exists in the non-parallel case. In case of a large transaction, the Seconds_Behind_Master can be small even though there is still a large amount of execution time remaining for the transaction to complete on the slave. However, in the non-parallel case, at most one transaction can be involved. In the parallel case, the problem is amplified by the potential of thousands of queued transactions awaiting execution. So how to solve it? Attached is a patch that implements one possible solution: the Seconds_Behind_Master is only updated after a transaction commits, with the timestamp of the commit events. This seems more intuitive anyway. But it does introduce a semantic difference between the non-parallel and parallel behaviour for Seconds_Behind_Master. The value will in general be larger on a parallel slave than on a non-parallel slave, for the same actual slave lag. Monty suggested changing the behaviour also for non-parallel mode - letting Seconds_Behind_Master reflect only events actually committed, not just read from the relay log. This would introduce an incompatible behaviour for Seconds_Behind_Master, but could perhaps be done for 10.1, if desired. Doing it in stable 10.0 would be more drastic. So any opinions on this? - Should Seconds_Behind_Master be changed as per above in parallel replication (from 10.0 on)? - If not, any suggestion for another semantics for Seconds_Behind_Master in parallel replication? - If so, should the change to Seconds_Behind_Master also be done in the non-parallel case in 10.1? What about 10.0? - Any comments on the patch? Here is a test case that shows the problem. It simulates a slave catching up with a bunch of INSERT(SLEEP(1)). Without my patch, Seconds_Behind_Master=2 just after starting the parallel slave, while the non-parallel slave shows Seconds_Behind_Master=9. With my patch, the parallel slave shows Seconds_Behind_Master=11. ----------------------------------------------------------------------- --source include/have_innodb.inc --source include/have_binlog_format_statement.inc --let $rpl_topology=1->2 --source include/rpl_init.inc # Test various aspects of parallel replication. --connection server_1 ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; CALL mtr.add_suppression("Unsafe statement written to the binary log using statement format"); CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; --save_master_pos --connection server_2 --sync_with_master --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=1; --connection server_1 --disable_warnings INSERT INTO t1 VALUES (1, SLEEP(1)); INSERT INTO t1 VALUES (2, SLEEP(1)); INSERT INTO t1 VALUES (3, SLEEP(1)); INSERT INTO t1 VALUES (4, SLEEP(1)); INSERT INTO t1 VALUES (5, SLEEP(1)); INSERT INTO t1 VALUES (6, SLEEP(1)); INSERT INTO t1 VALUES (7, SLEEP(1)); INSERT INTO t1 VALUES (8, SLEEP(1)); INSERT INTO t1 VALUES (9, SLEEP(1)); INSERT INTO t1 VALUES (10, SLEEP(1)); --disable_warnings --save_master_pos --connection server_2 --source include/start_slave.inc SELECT SLEEP(1); --let $status_items= Seconds_Behind_Master --source include/show_slave_status.inc --sync_with_master --let $status_items= Seconds_Behind_Master --source include/show_slave_status.inc # Clean up. --connection server_2 --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=DEFAULT; --source include/start_slave.inc --connection server_1 DROP TABLE t1; --source include/rpl_end.inc ----------------------------------------------------------------------- Thanks, - Kristian. ** The full detail is a bit more complex. Some events actually contains the time when the corresponding query _started_ executing, and another value which is the duration of execution. These are added by the slave to get the timestamp for the creation of the event (at the end of query exection). Also, the slave adjusts the timestamp with any time difference between the master and slave clock (eg. time zone for example).