Rich Prohaska <prohaska7@gmail.com> writes:
The group lock retry algorithm is on the https://github.com/ prohaska7/tokuft/tree/killwait branch. Its unit tests pass. Needed to add some test only functions to get reproducible behaviour.
The group lock retry algorithm is integrated into my mariadb server on the https://github.com/prohaska7/mariadb-server/tree/toku_opr3 branch. Ran sysbench oltp on a small 1000 row table successfully.
Looks great, thanks! It passes tests for me, as well.
I am going to write up the tokudb lock tree races that were fixed and email to George Lorch @ Percona so that this code can be integrated into PerconaFT.
Ok, sounds great! I will push the replication part of the patch to MariaDB 10.1, then (the async deadlock kill).
From the git history, it looks like new TokuDB releases (from Percona Server) are regularly merged into MariaDB 10.1, so I'm thinking that we can get your TokuDB/tokuft changes into MariaDB that way, in the next regular TokuDB merges. I will check it and add any missing MariaDB stuff, if it is not part of the changes that go upstream.
Does that sounds ok to you?
Removed the lock wait for report from the lock request start method since it is redundant with the report that will occur when the lock request is retried in the lock request wait method.
The reason I added this reporting originally was for the case where a deadlock is detected. If transaction T1 tries to get a lock with lock_request::start(), but a deadlock is detected (DB_LOCK_DEADLOCK is returned), the lock request wait method will not be called (if I understand the code correctly), so the reporting in lock_request::start() was not redundant. The rationale is that if T1 gets aborted due to a deadlock with T2, and T2 is later in the replication commit order, then when T1 is run again by replication, it will almost certainly conflict with T2 again. So we might as well get T2 killed early (by doing the report already in start()). But on the other hand, things will work correctly without any reporting in start(), and with only a slight delay in case of a conflict. And the assumption in optimistic parallel replication is that conflicts will be relatively rare. So I'm fine without reporting in start(), as you have in your current code. Looks like we are close now to having optimistic parallel replication working with TokuDB. Thanks for all your work on this, Rich! - Kristian.