[Maria-developers] On the issue of Seconds_behind_master and Parallel Replication
It was brought to my attention an issue with parallel replication and the Seconds_Behind_Master field of SHOW SLAVE STATUS. I have a possible patch for this, but I wanted to discuss it on the list, as it changes semantics compared to the non-parallel case. Each binlog event contains a timestamp (**) of when the event was created on the master. Whenever the slave SQL thread reads an event from the relay log, it updates the value of Seconds_Behind_Master to the difference between the slave's current time and the event's timestamp. Now in parallel replication, the SQL thread can read a large number of events from the relay log and queue them in-memory for the worker threads. So a small value of Seconds_Behind_Master means only that recent events have been queued - it might still be a long time before the worker threads have had time to actually execute all the queued events. Apparently the problem is (justified) user confusion about this queuing delay not being reflected in Seconds_Behind_Master. The same problem actually exists in the non-parallel case. In case of a large transaction, the Seconds_Behind_Master can be small even though there is still a large amount of execution time remaining for the transaction to complete on the slave. However, in the non-parallel case, at most one transaction can be involved. In the parallel case, the problem is amplified by the potential of thousands of queued transactions awaiting execution. So how to solve it? Attached is a patch that implements one possible solution: the Seconds_Behind_Master is only updated after a transaction commits, with the timestamp of the commit events. This seems more intuitive anyway. But it does introduce a semantic difference between the non-parallel and parallel behaviour for Seconds_Behind_Master. The value will in general be larger on a parallel slave than on a non-parallel slave, for the same actual slave lag. Monty suggested changing the behaviour also for non-parallel mode - letting Seconds_Behind_Master reflect only events actually committed, not just read from the relay log. This would introduce an incompatible behaviour for Seconds_Behind_Master, but could perhaps be done for 10.1, if desired. Doing it in stable 10.0 would be more drastic. So any opinions on this? - Should Seconds_Behind_Master be changed as per above in parallel replication (from 10.0 on)? - If not, any suggestion for another semantics for Seconds_Behind_Master in parallel replication? - If so, should the change to Seconds_Behind_Master also be done in the non-parallel case in 10.1? What about 10.0? - Any comments on the patch? Here is a test case that shows the problem. It simulates a slave catching up with a bunch of INSERT(SLEEP(1)). Without my patch, Seconds_Behind_Master=2 just after starting the parallel slave, while the non-parallel slave shows Seconds_Behind_Master=9. With my patch, the parallel slave shows Seconds_Behind_Master=11. ----------------------------------------------------------------------- --source include/have_innodb.inc --source include/have_binlog_format_statement.inc --let $rpl_topology=1->2 --source include/rpl_init.inc # Test various aspects of parallel replication. --connection server_1 ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; CALL mtr.add_suppression("Unsafe statement written to the binary log using statement format"); CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; --save_master_pos --connection server_2 --sync_with_master --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=1; --connection server_1 --disable_warnings INSERT INTO t1 VALUES (1, SLEEP(1)); INSERT INTO t1 VALUES (2, SLEEP(1)); INSERT INTO t1 VALUES (3, SLEEP(1)); INSERT INTO t1 VALUES (4, SLEEP(1)); INSERT INTO t1 VALUES (5, SLEEP(1)); INSERT INTO t1 VALUES (6, SLEEP(1)); INSERT INTO t1 VALUES (7, SLEEP(1)); INSERT INTO t1 VALUES (8, SLEEP(1)); INSERT INTO t1 VALUES (9, SLEEP(1)); INSERT INTO t1 VALUES (10, SLEEP(1)); --disable_warnings --save_master_pos --connection server_2 --source include/start_slave.inc SELECT SLEEP(1); --let $status_items= Seconds_Behind_Master --source include/show_slave_status.inc --sync_with_master --let $status_items= Seconds_Behind_Master --source include/show_slave_status.inc # Clean up. --connection server_2 --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=DEFAULT; --source include/start_slave.inc --connection server_1 DROP TABLE t1; --source include/rpl_end.inc ----------------------------------------------------------------------- Thanks, - Kristian. ** The full detail is a bit more complex. Some events actually contains the time when the corresponding query _started_ executing, and another value which is the duration of execution. These are added by the slave to get the timestamp for the creation of the event (at the end of query exection). Also, the slave adjusts the timestamp with any time difference between the master and slave clock (eg. time zone for example).
Insightful explanation, thank you. On 10/15/15 08:16, Kristian Nielsen wrote:
So any opinions on this?
This is /my/ opinion, not discussed with other Booking.com people.
- Should Seconds_Behind_Master be changed as per above in parallel replication (from 10.0 on)?
Yes, I think so ... or at least /optionally/ so. Alternatively, add new variable (perhaps something like SECONDS_BEHIND_COMMITTED) and leave the existing behavior as-is.
- If not, any suggestion for another semantics for Seconds_Behind_Master in parallel replication?
- If so, should the change to Seconds_Behind_Master also be done in the non-parallel case in 10.1? What about 10.0?
If we stick with one variable, can the behavior be easily made to be configurable? If so, then DBAs could decide which makes more sense in their environment. Cheers, -Eric -- http://www.freesa.org/ -- mobile: +31 620719662 aim: ericigps -- skype: eric_herman -- jabber: eric.herman@gmail.com
From a user's perspective, I like the idea of introducing the change for both parallel and non-parallel in 10.1. On 15/10/2015 08:16, Kristian Nielsen wrote:
It was brought to my attention an issue with parallel replication and the Seconds_Behind_Master field of SHOW SLAVE STATUS. I have a possible patch for this, but I wanted to discuss it on the list, as it changes semantics compared to the non-parallel case.
Each binlog event contains a timestamp (**) of when the event was created on the master. Whenever the slave SQL thread reads an event from the relay log, it updates the value of Seconds_Behind_Master to the difference between the slave's current time and the event's timestamp.
Now in parallel replication, the SQL thread can read a large number of events from the relay log and queue them in-memory for the worker threads. So a small value of Seconds_Behind_Master means only that recent events have been queued - it might still be a long time before the worker threads have had time to actually execute all the queued events. Apparently the problem is (justified) user confusion about this queuing delay not being reflected in Seconds_Behind_Master.
The same problem actually exists in the non-parallel case. In case of a large transaction, the Seconds_Behind_Master can be small even though there is still a large amount of execution time remaining for the transaction to complete on the slave. However, in the non-parallel case, at most one transaction can be involved. In the parallel case, the problem is amplified by the potential of thousands of queued transactions awaiting execution.
So how to solve it? Attached is a patch that implements one possible solution: the Seconds_Behind_Master is only updated after a transaction commits, with the timestamp of the commit events. This seems more intuitive anyway. But it does introduce a semantic difference between the non-parallel and parallel behaviour for Seconds_Behind_Master. The value will in general be larger on a parallel slave than on a non-parallel slave, for the same actual slave lag.
Monty suggested changing the behaviour also for non-parallel mode - letting Seconds_Behind_Master reflect only events actually committed, not just read from the relay log. This would introduce an incompatible behaviour for Seconds_Behind_Master, but could perhaps be done for 10.1, if desired. Doing it in stable 10.0 would be more drastic.
So any opinions on this?
- Should Seconds_Behind_Master be changed as per above in parallel replication (from 10.0 on)?
- If not, any suggestion for another semantics for Seconds_Behind_Master in parallel replication?
- If so, should the change to Seconds_Behind_Master also be done in the non-parallel case in 10.1? What about 10.0?
- Any comments on the patch?
I've pushed to 10.0 and 10.1 the change I described: With parallel replication, seconds_behind_master is updated only after transactions commit. (I did not change the behaviour of the non-parallel replication case yet.) Ian, do you have enough information from this thread that you could update the docs in the knowledgebase accordingly? - Kristian. Ian Gilfillan <ian@mariadb.org> writes:
From a user's perspective, I like the idea of introducing the change for both parallel and non-parallel in 10.1.
On 15/10/2015 08:16, Kristian Nielsen wrote:
It was brought to my attention an issue with parallel replication and the Seconds_Behind_Master field of SHOW SLAVE STATUS. I have a possible patch for this, but I wanted to discuss it on the list, as it changes semantics compared to the non-parallel case.
Each binlog event contains a timestamp (**) of when the event was created on the master. Whenever the slave SQL thread reads an event from the relay log, it updates the value of Seconds_Behind_Master to the difference between the slave's current time and the event's timestamp.
Now in parallel replication, the SQL thread can read a large number of events from the relay log and queue them in-memory for the worker threads. So a small value of Seconds_Behind_Master means only that recent events have been queued - it might still be a long time before the worker threads have had time to actually execute all the queued events. Apparently the problem is (justified) user confusion about this queuing delay not being reflected in Seconds_Behind_Master.
The same problem actually exists in the non-parallel case. In case of a large transaction, the Seconds_Behind_Master can be small even though there is still a large amount of execution time remaining for the transaction to complete on the slave. However, in the non-parallel case, at most one transaction can be involved. In the parallel case, the problem is amplified by the potential of thousands of queued transactions awaiting execution.
So how to solve it? Attached is a patch that implements one possible solution: the Seconds_Behind_Master is only updated after a transaction commits, with the timestamp of the commit events. This seems more intuitive anyway. But it does introduce a semantic difference between the non-parallel and parallel behaviour for Seconds_Behind_Master. The value will in general be larger on a parallel slave than on a non-parallel slave, for the same actual slave lag.
Monty suggested changing the behaviour also for non-parallel mode - letting Seconds_Behind_Master reflect only events actually committed, not just read from the relay log. This would introduce an incompatible behaviour for Seconds_Behind_Master, but could perhaps be done for 10.1, if desired. Doing it in stable 10.0 would be more drastic.
So any opinions on this?
- Should Seconds_Behind_Master be changed as per above in parallel replication (from 10.0 on)?
- If not, any suggestion for another semantics for Seconds_Behind_Master in parallel replication?
- If so, should the change to Seconds_Behind_Master also be done in the non-parallel case in 10.1? What about 10.0?
- Any comments on the patch?
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
On 14/11/2015 11:44, Kristian Nielsen wrote:
Ian, do you have enough information from this thread that you could update the docs in the knowledgebase accordingly? Thanks, yes, I will update the documentation around the release time.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
The same problem actually exists in the non-parallel case. In case of a
Just to elaborate on this, here is the binlog from a transaction with some artificial long-running queries (using INSERT(SLEEP())): #151015 9:43:52 server id 1 end_log_pos 1120 GTID 0-1-5 BEGIN #151015 9:43:44 server id 1 end_log_pos 1210 Query thread_id=3 exec_time=0 error_code=0 insert into t1 values (1,0) #151015 9:43:44 server id 1 end_log_pos 1308 Query thread_id=3 exec_time=1 error_code=0 insert into t1 values (2, sleep(1)) #151015 9:43:45 server id 1 end_log_pos 1406 Query thread_id=3 exec_time=5 error_code=0 insert into t1 values (3, sleep(5)) #151015 9:43:50 server id 1 end_log_pos 1504 Query thread_id=3 exec_time=2 error_code=0 insert into t1 values (4, sleep(2)) #151015 9:43:52 server id 1 end_log_pos 1531 Xid = 15 COMMIT/*!*/; Suppose the slave starts executing this transaction at 9:44:00. The value of Seconds_Behind_Master will be set to something like the following as each event is read from the relay log: 8 16 15 11 14 16 Note in particular how the value jumps back and forth during the execution of the transaction. The time used on the slave is the time the event starts executing. The time used on the master is inconsistent with this, it is sometimes the time the transaction ends (BEGIN), sometimes the time the query ends. When these two values are subtracted, the result is somewhat fuzzy if the transaction runs for more than a second. Clearly, I do not have any great desire to jump through hoops and sacrifice performance in parallel replication to try and match a behaviour that seems poorly designed from the start, hence the proposed patch. (I do not have any strong opinion one way or the other on whether to change the existing behaviour of Seconds_Behind_Master in the non-parallel case.) - Kristian.
On 2015-10-15 08:16, Kristian Nielsen wrote:
Monty suggested changing the behaviour also for non-parallel mode - letting Seconds_Behind_Master reflect only events actually committed, not just read from the relay log. This would introduce an incompatible behaviour for Seconds_Behind_Master, but could perhaps be done for 10.1, if desired. Doing it in stable 10.0 would be more drastic.
I agree with Monty: we should only update on commit and in both // and non-// cases. Having yo-yo in seconds-behind-master (because of a long running transaction) should probably be considered a bug and should be fixed (non-// case). The commit event (if we can call it as such) is the only reliable (-ish) timing synchronization point in the binlog stream, all the rest depends on execution time and session-latency. The timestamp of those non-commit events should not be trusted for delay computation. Fixing that in 10.0 has too many implications, I think fixing in 10.1 is a good target. If something needs to be done in 10.0, and if it is not too hard, adding a "improved_seconds_behind_master" global variables for this behavior could be done. In 10.0, the default value of this variable would be 0, in 10.1 it would be 1. If that path is chosen, I think this variable should be deprecated in 10.1 and go away in 10.2 to reduce code complexity. Jean-François Gagné
----- On 16 Oct, 2015, at 3:57 PM, Jean-François Gagné jeanfrancois.gagne@booking.com wrote:
On 2015-10-15 08:16, Kristian Nielsen wrote:
Monty suggested changing the behaviour also for non-parallel mode - letting Seconds_Behind_Master reflect only events actually committed, not just read from the relay log. This would introduce an incompatible behaviour for Seconds_Behind_Master, but could perhaps be done for 10.1, if desired. Doing it in stable 10.0 would be more drastic.
I agree with Monty: we should only update on commit and in both // and non-// cases.
+1 here. works for me too.
Having yo-yo in seconds-behind-master (because of a long running transaction) should probably be considered a bug and should be fixed (non-// case).
The commit event (if we can call it as such) is the only reliable (-ish) timing synchronization point in the binlog stream, all the rest depends on execution time and session-latency. The timestamp of those non-commit events should not be trusted for delay computation.
Fixing that in 10.0 has too many implications, I think fixing in 10.1 is a good target.
If something needs to be done in 10.0, and if it is not too hard, adding a "improved_seconds_behind_master" global variables for this behavior could be done. In 10.0, the default value of this variable would be 0, in 10.1 it would be 1. If that path is chosen, I think this variable should be deprecated in 10.1 and go away in 10.2 to reduce code complexity.
seconds_behind_master=commit? I tend to think people using 10.0 haven't seen or cared enough about it to be worth fixing in 10.0. -- -- Daniel Black, Engineer @ Open Query (http://openquery.com.au) Remote expertise & maintenance for MySQL/MariaDB server environments.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
- Should Seconds_Behind_Master be changed as per above in parallel replication (from 10.0 on)?
- If so, should the change to Seconds_Behind_Master also be done in the non-parallel case in 10.1? What about 10.0?
So there seems to be agreement that Seconds_Behind_Master should be changed in 10.1 to only update on commit (also in non-parallel case). This does seem a better behaviour. One consequence of this is the following: Suppose the master is idle for 1 hour. The Seconds_Behind_Master is reported as 0 on a slave that is up to date. Suppose now the master executes a transaction. While that new transaction is being applied, the Seconds_Behind_Master will be reported as 3600 (one hour), then go back to zero. This is arguably the correct value - the slave is indeed in the state of the master one hour ago while executing that transaction. But I still wonder if this will not just cause another kind of confusion with users (or existing monitoring tools), as they will interpret it as the slave being far behind the master. While in reality it is only one transaction behind. - Kristian.
Hi, Is it possible to know how many transactions/operations slave is behind the master ? Assume in above that master is idle for several days, if master then executes a one transaction, does that mean that slave is behind the master also several days ? R: Jan On Thu, Oct 22, 2015 at 9:12 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
- Should Seconds_Behind_Master be changed as per above in parallel replication (from 10.0 on)?
- If so, should the change to Seconds_Behind_Master also be done in the non-parallel case in 10.1? What about 10.0?
So there seems to be agreement that Seconds_Behind_Master should be changed in 10.1 to only update on commit (also in non-parallel case). This does seem a better behaviour.
One consequence of this is the following: Suppose the master is idle for 1 hour. The Seconds_Behind_Master is reported as 0 on a slave that is up to date. Suppose now the master executes a transaction. While that new transaction is being applied, the Seconds_Behind_Master will be reported as 3600 (one hour), then go back to zero.
This is arguably the correct value - the slave is indeed in the state of the master one hour ago while executing that transaction. But I still wonder if this will not just cause another kind of confusion with users (or existing monitoring tools), as they will interpret it as the slave being far behind the master. While in reality it is only one transaction behind.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Jan Lindström <jan.lindstrom@mariadb.com> writes:
Is it possible to know how many transactions/operations slave is behind the master ?
Not really with the current code.
Assume in above that master is idle for several days, if master then executes a one transaction, does that mean that slave is behind the master also several days ?
I think it will show so while executing the transaction, but not when that transaction has been completed, obviously. - Kristian.
participants (6)
-
Daniel Black
-
Eric Herman
-
Ian Gilfillan
-
Jan Lindström
-
Jean-François Gagné
-
Kristian Nielsen