Hi! <cut>
2016-09-01 10:33:20 140078976283392 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-53' for key 'PRIMARY', Gtid 0-1-52, Internal MariaDB error code: 1942
This happens because the mysql.gtid_slave_pos table is MyISAM (which is default in mysql-test-run, but not in the normal server install), and parallel replication needs to roll back a transaction after it has updated the table. Because of MyISAM, the gtid_slave_pos change cannot be rolled back.
Sorry about the myisam part. I did say in #maria at once after I sent the email that the problem with 10.2 was wrong used engine, but apparently you missed that.
Maybe parallel replication could in this case manually undo its change in the table as part of the rollback. It's just a DELETE of the row previously inserted.
That could be a solution. In any case, I should at least look at adding a better error message if this happens.
In any case, currently the fix is to use InnoDB for the table:
--- rpl_skr.test~ 2016-09-01 10:27:21.214633498 +0200 +++ rpl_skr.test 2016-09-01 10:35:50.660242337 +0200 @@ -8,6 +8,9 @@ --connection server_2 SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads; --source include/stop_slave.inc +SET sql_log_bin=0; +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; +SET sql_log_bin=1; SET GLOBAL slave_parallel_threads=10; SET GLOBAL slave_parallel_mode='conservative'; --source include/start_slave.inc
Yes, same fix that I did.
bb-10.2-jan tree is a working tree for a merge of MariaDB 10.2 and MySQL 5.7
When running rpl_skr in 10.2 it takes 2 seconds When running it in the bb-10.2-jan tree it takes either a long time or we get a timeout.
This is because of errorneous merge. The original code:
if (waitee_buf_ptr) { lock_report_waiters_to_mysql(waitee_buf_ptr, start_mysql_thd, victim_trx_id);
The bb-10.2-jan code:
if (victim_trx && waitee_buf_ptr) { lock_report_waiters_to_mysql(waitee_buf_ptr, start_mysql_thd, victim_trx->id);
So if victim_trx is NULL the waits are not reported to parallel replication at all, causing the stalls and/or hangs. victim_trx is NULL unless InnoDB itself detects a deadlock.
I've attached a patch that fixes this, can also be pulled from here:
https://github.com/knielsen/server/commits/montyrpl
Or should I push it directly into bb-10.2-jan? This makes the rpl_skr.test complete correctly in < 1 second.
Thanks a lot for the patch! Jani will pull it into his working tree
This is probably because of the new lock code in lock0lock.cc and lock0wait.cc which doesn't break conflicting transaction but instead waits for a timeout
The merge appears very rough. Shouldn't the waitee_buf be integrated into the new DeadlockChecker class? Why is it necessary to thd_report_wait_for() on internal transactions like here?
/* m_trx->mysql_thd is NULL if it's an internal trx. So current_thd is used */ if (err == DB_LOCK_WAIT) { ut_ad(wait_for && wait_for->trx); wait_for->trx->abort_type = TRX_REPLICATION_ABORT; thd_report_wait_for(current_thd, wait_for->trx->mysql_thd); wait_for->trx->abort_type = TRX_SERVER_ABORT; } return(err);
Maybe I should try to write a better patch for integrating this in the new InnoDB code.
It would be great if you could help Jan with a better patch! He still has a lot of merge work to do and the whole server team is waiting on Jan to be ready so that we can add the final touches and release MariaDB 10.2-beta.
What do you think about changing this to use the async deadlock kill in background thread, as discussed in this thread?
https://lists.launchpad.net/maria-developers/msg09902.html
This would allow to simplify the code in lock0lock.cc, and avoid the locking hacks in innobase_kill_query()?
I was trying to find the exact patch/patches you are referring to. https://github.com/knielsen/server/commit/841ada8c8ac39c024cd1eafe4b346deecb... https://github.com/knielsen/server/commit/b256733df2cf9f10d38e44ca4979843a3b... Is it possible for you to create a clean patch for async deadlock for bb-10.2-jan that Jan and I can review and apply? Regards and thanks, Monty