Hi Serg, We discussed on IRC a review of the $subj bugs related to deadlocks in parallel replication and handling of automatic retry of transactions in case of deadlock. The patches are in my tree: lp:~maria-captains/maria/10.0-knielsen The complete diff can be obtained eg. with: bzr diff -rrevid:psergey@askmonty.org-20140429143317-tkmxrzu6y36hpi7t..revid:knielsen@knielsen-hq.org-20140610081315-o5nx3u32uvqaai7q Let me give some background for these bugs: When two transactions T1 and T2 group-commit on the master, we assume that they have no lock conflicts (otherwise one would have had to wait for the other to commit). So we run them in parallel on the slave. But before we let T2 commit, we wait until T1 has committed first, so that we get the same binlog order on the slave as we had on the master. But if T1 and T2 somehow _do_ get a lock conflict on the slave, and T1 ends up waiting for some lock held by T2, we get a deadlock. This causes the replication to hang (the wait of T2 for T1 to commit is not part of InnoDB deadlock detection), which is quite bad. Elena's testing of the parallel replication in fact uncovered a number of cases where such deadlock issues occur: 1. MDEV-5914. It turns out that InnoDB gap locks can work in a non-symmetric way. If T1 is an INSERT, and T2 is a DELETE, then T1 can run first and not block T2. But if T2 runs first, then it may set a gap lock that blocks T1. Thus, depending on exact execution order, we can get T1 and T2 to group commit on the master, but T2 block T1 on the slave and cause a deadlock. 2. MDEV-5941. If eg. an UPDATE uses different execution plans on the master and slave, it might use index scan on the master, but table scan on the slave. Thus it may take more locks on the slave than on the master, causing it to conflict with another transaction that is run in parallel and cause a deadlock. 3. MDEV-6020. Here, there is a multi-value insert into multiple partitions of a table with auto-increment key. It appears that in this case InnoDB takes an auto-increment table lock on each partition. Since these locks are taken in the order of the inserted values, they can be taken in different order between different INSERT statements, and can cause a deadlock. Since these locks are released during the statement execution (before the commit), such deadlock can occur on the slave even though it did not occur on the master. (Note that this deadlock _is_ detected by InnoDB, and is not related to the commit order on the parallel slave). I know a way to prevent (1) on the slave (and it is implemented in the patches). I do not know if there is a way to prevent (3), it would require understanding better how innodb autoincrement and partitioning works. I do not see any way to prevent (2) from occuring on the slave, in the general case. It also seems to me that there are likely to be more corner cases lurking, and even if I somehow managed to fix the three above problems, there would be some other cases where an extra lock conflict can occur on the slave and cause the slave to hang. So I decided that a general solution is needed to such deadlocks. So the currently proposed patch does two things: A. It adds code to InnoDB, so that whenever T1 starts to wait inside InnoDB for a lock held by T2, this wait is reported to the upper server layer. The replication code will then check if T1 and T2 are part of parallel replication such that T1 must commit before T2. If so, we have a deadlock, and the code will terminate T2 with a deadlock error, allowing T1 to proceed. B. It implements code to handle deadlocks during parallel replication. When a transaction gets a deadlock error (normal, or provoked by the new deadlock detection code from (B)), it is rolled back, and re-executed. The result should be that even if there are other non-discovered cases where locks can conflict in different ways on the master from on the parallel slave, such conflicts will not cause replication to fail. Instead they will be detected, and handled automatically by resolving the deadlock. I think this will be important to ensure the reliability of parallel replication for users. I went through quite a few different ideas and attempts before ending with the present solution. So I wanted to give this background about how the patch relates to the basic problem that needs to be solved. I found a _lot_ of tricky issues during testing, and so I am somewhat unhappy about the complexity of this, even though the final patch may look simple enough on the surface. Maybe you can have some ideas for what could be done simpler, or at least question anything that looks incorrect or unnecessary. Thanks, - Kristian.