Hello Kristian,
I am running your opt2 branch with a small sysbench oltp test (1 table, 1000 rows, 8 threads). the good news is that the slave stalls due to lock timeouts are gone. the bad news is that the slave performance is suspect.
when slave in conservative mode with 2 threads, the tokudb wait for callback is being called (i put in a "printf"), which implies a parallel lock conflict. I assumed that conservative mode implies parallel execution of transactions that were group committed together, which I assumed would imply that these transactions were conflict free. Obviously not the case.
when slave in optimistic mode with 8 threads, i see very high slave query execution times in processlist.
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+-----------------------------------------------+------------------+----------+
| 6 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
| 16 | system user | | NULL | Connect | 383 | Waiting for master to send event | NULL | 0.000 |
| 17 | system user | | NULL | Connect | 7 | Waiting for prior transaction to commit | NULL | 0.000 |
| 18 | system user | | NULL | Connect | 3 | Waiting for prior transaction to commit | NULL | 0.000 |
| 19 | system user | | NULL | Connect | 3 | Waiting for prior transaction to commit | NULL | 0.000 |
| 20 | system user | | NULL | Connect | 3 | Delete_rows_log_event::find_row(-1) | NULL | 0.000 |
| 21 | system user | | NULL | Connect | 3 | Waiting for prior transaction to commit | NULL | 0.000 |
| 22 | system user | | NULL | Connect | 3 | Waiting for prior transaction to commit | NULL | 0.000 |
| 23 | system user | | NULL | Connect | 7 | Waiting for prior transaction to commit | NULL | 0.000 |
| 24 | system user | | NULL | Connect | 3 | Waiting for prior transaction to commit | NULL | 0.000 |
| 25 | system user | | NULL | Connect | 382 | Waiting for room in worker thread event queue | NULL | 0.000 |
It appears that there is some MULTIPLE SECOND STALL somewhere. gdb shows that the threads are either
(1) waiting in the tokudb lock manager, or
(2) waiting in the wait_for_commit::wait_for_prior_commit2 function.