[Maria-developers] Problem with parallel replication in 10.2
Hi! I was trying to run a test that fails in the upcoming bb-10.2-jan on the normal 10.2 tree, when I noticed this strange issue: - Test fails with timeout when running with --debug - When looking at the trace file, I notice that we get a duplicate key error for the table gtid_slave_post (MyISAM table). Is this something normal ? To repeat: Store the included test in suite/rpl and run it with: mysql-test-run --debug --record rpl_skr Issue 2: bb-10.2-jan tree is a working tree for a merge of MariaDB 10.2 and MySQL 5.7 When running rpl_skr in 10.2 it takes 2 seconds When running it in the bb-10.2-jan tree it takes either a long time or we get a timeout. This is probably because of the new lock code in lock0lock.cc and lock0wait.cc which doesn't break conflicting transaction but instead waits for a timeout Would appreciate any help with this! Note that rpl_skr is a test that was originally part of rpl_parallel.test, but I have made it separate to be able to more easily test for this issue. Before testing bb-10.2-jan, please apply this patch that fixes one critical issue in this tree related to group-commit: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index b69468d..0e9caed 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -4976,16 +4953,15 @@ innobase_commit( thd_wakeup_subsequent_commits(thd, 0); /* Now do a write + flush of logs. */ - if (!read_only) { - trx_commit_complete_for_mysql(trx); - } + trx_commit_complete_for_mysql(trx); trx_deregister_from_2pc(trx); } else { /* We just mark the SQL statement ended and do not do a transaction commit */ + DBUG_PRINT("info", ("Just mark SQL statement")); /* If we had reserved the auto-inc lock for some table in this SQL statement we release it now */ Regards, Monty
Michael Widenius <michael.widenius@gmail.com> writes:
I was trying to run a test that fails in the upcoming bb-10.2-jan on the normal 10.2 tree, when I noticed this strange issue:
- Test fails with timeout when running with --debug - When looking at the trace file, I notice that we get a duplicate key error for the table gtid_slave_post (MyISAM table). Is this something normal ?
Like this: 2016-09-01 10:33:20 140078976283392 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-53' for key 'PRIMARY', Gtid 0-1-52, Internal MariaDB error code: 1942 This happens because the mysql.gtid_slave_pos table is MyISAM (which is default in mysql-test-run, but not in the normal server install), and parallel replication needs to roll back a transaction after it has updated the table. Because of MyISAM, the gtid_slave_pos change cannot be rolled back. Maybe parallel replication could in this case manually undo its change in the table as part of the rollback. It's just a DELETE of the row previously inserted. In any case, currently the fix is to use InnoDB for the table: --- rpl_skr.test~ 2016-09-01 10:27:21.214633498 +0200 +++ rpl_skr.test 2016-09-01 10:35:50.660242337 +0200 @@ -8,6 +8,9 @@ --connection server_2 SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads; --source include/stop_slave.inc +SET sql_log_bin=0; +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; +SET sql_log_bin=1; SET GLOBAL slave_parallel_threads=10; SET GLOBAL slave_parallel_mode='conservative'; --source include/start_slave.inc
bb-10.2-jan tree is a working tree for a merge of MariaDB 10.2 and MySQL 5.7
When running rpl_skr in 10.2 it takes 2 seconds When running it in the bb-10.2-jan tree it takes either a long time or we get a timeout.
This is because of errorneous merge. The original code: if (waitee_buf_ptr) { lock_report_waiters_to_mysql(waitee_buf_ptr, start_mysql_thd, victim_trx_id); The bb-10.2-jan code: if (victim_trx && waitee_buf_ptr) { lock_report_waiters_to_mysql(waitee_buf_ptr, start_mysql_thd, victim_trx->id); So if victim_trx is NULL the waits are not reported to parallel replication at all, causing the stalls and/or hangs. victim_trx is NULL unless InnoDB itself detects a deadlock. I've attached a patch that fixes this, can also be pulled from here: https://github.com/knielsen/server/commits/montyrpl Or should I push it directly into bb-10.2-jan? This makes the rpl_skr.test complete correctly in < 1 second.
This is probably because of the new lock code in lock0lock.cc and lock0wait.cc which doesn't break conflicting transaction but instead waits for a timeout
The merge appears very rough. Shouldn't the waitee_buf be integrated into the new DeadlockChecker class? Why is it necessary to thd_report_wait_for() on internal transactions like here? /* m_trx->mysql_thd is NULL if it's an internal trx. So current_thd is used */ if (err == DB_LOCK_WAIT) { ut_ad(wait_for && wait_for->trx); wait_for->trx->abort_type = TRX_REPLICATION_ABORT; thd_report_wait_for(current_thd, wait_for->trx->mysql_thd); wait_for->trx->abort_type = TRX_SERVER_ABORT; } return(err); Maybe I should try to write a better patch for integrating this in the new InnoDB code. What do you think about changing this to use the async deadlock kill in background thread, as discussed in this thread? https://lists.launchpad.net/maria-developers/msg09902.html This would allow to simplify the code in lock0lock.cc, and avoid the locking hacks in innobase_kill_query()? - Kristian.
Hi! <cut>
2016-09-01 10:33:20 140078976283392 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-53' for key 'PRIMARY', Gtid 0-1-52, Internal MariaDB error code: 1942
This happens because the mysql.gtid_slave_pos table is MyISAM (which is default in mysql-test-run, but not in the normal server install), and parallel replication needs to roll back a transaction after it has updated the table. Because of MyISAM, the gtid_slave_pos change cannot be rolled back.
Sorry about the myisam part. I did say in #maria at once after I sent the email that the problem with 10.2 was wrong used engine, but apparently you missed that.
Maybe parallel replication could in this case manually undo its change in the table as part of the rollback. It's just a DELETE of the row previously inserted.
That could be a solution. In any case, I should at least look at adding a better error message if this happens.
In any case, currently the fix is to use InnoDB for the table:
--- rpl_skr.test~ 2016-09-01 10:27:21.214633498 +0200 +++ rpl_skr.test 2016-09-01 10:35:50.660242337 +0200 @@ -8,6 +8,9 @@ --connection server_2 SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads; --source include/stop_slave.inc +SET sql_log_bin=0; +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; +SET sql_log_bin=1; SET GLOBAL slave_parallel_threads=10; SET GLOBAL slave_parallel_mode='conservative'; --source include/start_slave.inc
Yes, same fix that I did.
bb-10.2-jan tree is a working tree for a merge of MariaDB 10.2 and MySQL 5.7
When running rpl_skr in 10.2 it takes 2 seconds When running it in the bb-10.2-jan tree it takes either a long time or we get a timeout.
This is because of errorneous merge. The original code:
if (waitee_buf_ptr) { lock_report_waiters_to_mysql(waitee_buf_ptr, start_mysql_thd, victim_trx_id);
The bb-10.2-jan code:
if (victim_trx && waitee_buf_ptr) { lock_report_waiters_to_mysql(waitee_buf_ptr, start_mysql_thd, victim_trx->id);
So if victim_trx is NULL the waits are not reported to parallel replication at all, causing the stalls and/or hangs. victim_trx is NULL unless InnoDB itself detects a deadlock.
I've attached a patch that fixes this, can also be pulled from here:
https://github.com/knielsen/server/commits/montyrpl
Or should I push it directly into bb-10.2-jan? This makes the rpl_skr.test complete correctly in < 1 second.
Thanks a lot for the patch! Jani will pull it into his working tree
This is probably because of the new lock code in lock0lock.cc and lock0wait.cc which doesn't break conflicting transaction but instead waits for a timeout
The merge appears very rough. Shouldn't the waitee_buf be integrated into the new DeadlockChecker class? Why is it necessary to thd_report_wait_for() on internal transactions like here?
/* m_trx->mysql_thd is NULL if it's an internal trx. So current_thd is used */ if (err == DB_LOCK_WAIT) { ut_ad(wait_for && wait_for->trx); wait_for->trx->abort_type = TRX_REPLICATION_ABORT; thd_report_wait_for(current_thd, wait_for->trx->mysql_thd); wait_for->trx->abort_type = TRX_SERVER_ABORT; } return(err);
Maybe I should try to write a better patch for integrating this in the new InnoDB code.
It would be great if you could help Jan with a better patch! He still has a lot of merge work to do and the whole server team is waiting on Jan to be ready so that we can add the final touches and release MariaDB 10.2-beta.
What do you think about changing this to use the async deadlock kill in background thread, as discussed in this thread?
https://lists.launchpad.net/maria-developers/msg09902.html
This would allow to simplify the code in lock0lock.cc, and avoid the locking hacks in innobase_kill_query()?
I was trying to find the exact patch/patches you are referring to. https://github.com/knielsen/server/commit/841ada8c8ac39c024cd1eafe4b346deecb... https://github.com/knielsen/server/commit/b256733df2cf9f10d38e44ca4979843a3b... Is it possible for you to create a clean patch for async deadlock for bb-10.2-jan that Jan and I can review and apply? Regards and thanks, Monty
Michael Widenius <michael.widenius@gmail.com> writes:
What do you think about changing this to use the async deadlock kill in background thread, as discussed in this thread?
https://lists.launchpad.net/maria-developers/msg09902.html
This would allow to simplify the code in lock0lock.cc, and avoid the locking hacks in innobase_kill_query()?
I was trying to find the exact patch/patches you are referring to.
https://github.com/knielsen/server/commit/841ada8c8ac39c024cd1eafe4b346deecb...
https://github.com/knielsen/server/commit/b256733df2cf9f10d38e44ca4979843a3b...
Is it possible for you to create a clean patch for async deadlock for bb-10.2-jan that Jan and I can review and apply?
Yes, I'll prepare a patch for review. It's probably easier if I apply it myself, as I want to do a minimal patch in 10.1 to get optimistic parallel replication working with TokuDB; and a full patch in 10.2 which includes the InnoDB simplifications. - Kristian.
Hi!
Is it possible for you to create a clean patch for async deadlock for bb-10.2-jan that Jan and I can review and apply?
Yes, I'll prepare a patch for review. It's probably easier if I apply it myself, as I want to do a minimal patch in 10.1 to get optimistic parallel replication working with TokuDB; and a full patch in 10.2 which includes the InnoDB simplifications.
It would be great if you could do the above! Feel free to push this into the bb-10.2-jan tree or into the 10.2 tree when Jan is ready with his merge! Regards, Monty
Michael Widenius <michael.widenius@gmail.com> writes:
Yes, I'll prepare a patch for review. It's probably easier if I apply it myself, as I want to do a minimal patch in 10.1 to get optimistic parallel replication working with TokuDB; and a full patch in 10.2 which includes the InnoDB simplifications.
It would be great if you could do the above!
Ok, so I prepared patches for this: http://lists.askmonty.org/pipermail/commits/2016-September/009740.html http://lists.askmonty.org/pipermail/commits/2016-September/009741.html http://lists.askmonty.org/pipermail/commits/2016-September/009742.html (Also available here: https://github.com/knielsen/server/commits/montyrpl) I would like to push the first of these into 10.1, in order to fix TokuDB. It has no functional changes for InnoDB/XtraDB. It introduces a new slave background thread, Monty could you check that code and see if it looks ok with respect to server shutdown and such? The two other patches change InnoDB to use the async deadlock kill, and remove the old locking hacks around innobase_kill_query() and such. In fact, I still got locking violation assertions without these patches. These last two patches should only go into 10.2. I think it will be good to get the code cleaned up like this. Serg, are you ok with making parallel replication deadlock kill happen asynchronously? A couple years ago we discussed not doing this after review (https://lists.launchpad.net/maria-developers/msg07483.html). This async kill fixes locking problems in both TokuDB and (10.2) InnoDB. Also, the new patch fixes the original problem; now there is no strange requirement for storage engines to not report false positives. All a storage engine needs to do is call thd_rpl_deadlock_check(waiting_thd, blocking_thd) to resolve possible deadlocks in optimistic parallel replication. No tricky semantics.
Feel free to push this into the bb-10.2-jan tree or into the 10.2 tree when Jan is ready with his merge!
Thanks. However, I see that Jan seems to be rebasing his bb-10.2-jan tree. That doesn't work well with multiple developers pushing into a tree. Also, these patches should not be folded into a large, anonymous "InnoDB 5.7 merge" commit message. One option is that I could push the first patch into 10.1, and all three patches into (main) 10.2. And then Jan can pick it up from main 10.2? Or do you prefer me pushing them into a shared tree (bb-10.2-jan or whatever)? - Kristian.
On Fri, Sep 2, 2016 at 2:17 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Michael Widenius <michael.widenius@gmail.com> writes:
Ok, so I prepared patches for this:
http://lists.askmonty.org/pipermail/commits/2016-September/009740.html http://lists.askmonty.org/pipermail/commits/2016-September/009741.html http://lists.askmonty.org/pipermail/commits/2016-September/009742.html
(Also available here: https://github.com/knielsen/server/commits/montyrpl)
The two other patches change InnoDB to use the async deadlock kill, and remove the old locking hacks around innobase_kill_query() and such. In fact, I still got locking violation assertions without these patches. These last two patches should only go into 10.2. I think it will be good to get the code cleaned up like this.
In my opinion two other patches that change InnoDB should also go to 10.1, this looks a lot more robust and cleaner than current hack with mutexes and states to identify which mutexes we have. Similar mechanism could be then used also on Galera where we still have problems with these mutexes. R: Jan
Jan Lindström <jan.lindstrom@mariadb.com> writes:
In my opinion two other patches that change InnoDB should also go to 10.1, this looks a lot more robust and cleaner than current hack with mutexes and states to identify which mutexes we have. Similar mechanism could be
I don't see how to justify putting the InnoDB changes into stable 10.1 (in non-galera versions)? They do not provide any concrete benefit to users, and there is real risk that this could break stuff, the thread interactions here are quite complex. The first patch though is relatively safe for 10.1. It doesn't change innodb workings. And it is a pre-requisite to fix optimistic parallel replication for TokuDB. - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Ok, so I prepared patches for this:
http://lists.askmonty.org/pipermail/commits/2016-September/009740.html http://lists.askmonty.org/pipermail/commits/2016-September/009741.html http://lists.askmonty.org/pipermail/commits/2016-September/009742.html
I would like to push the first of these into 10.1, in order to fix TokuDB. It has no functional changes for InnoDB/XtraDB. It introduces a
This was pushed into 10.1.
The two other patches change InnoDB to use the async deadlock kill, and remove the old locking hacks around innobase_kill_query() and such. In fact,
I've now pushed these two patches into 10.2. So now in 10.2, InnoDB deadlock kill for parallel replication happens asynchroneously, and there is no need for locking hacks around this. - Kristian.
On Thu, Sep 1, 2016 at 12:31 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Michael Widenius <michael.widenius@gmail.com> writes:
This is because of errorneous merge. The original code:
if (waitee_buf_ptr) { lock_report_waiters_to_mysql(waitee_buf_ptr, start_mysql_thd, victim_trx_id);
The bb-10.2-jan code:
if (victim_trx && waitee_buf_ptr) { lock_report_waiters_to_mysql(waitee_buf_ptr, start_mysql_thd, victim_trx->id);
So if victim_trx is NULL the waits are not reported to parallel replication at all, causing the stalls and/or hangs. victim_trx is NULL unless InnoDB itself detects a deadlock.
Kristian, thanks for pointing this to me, I was not sure what this function was doing, so I merged it incorrectly. R: Jan
participants (3)
-
Jan Lindström
-
Kristian Nielsen
-
Michael Widenius