Sergei Golubchik <serg@mariadb.org> writes:
On May 26, Nirbhay Choubey wrote:
+ if (wait_for_prior_commit())
Please call this method tmptable_wait_for_prior_commit() or something like that. To avoid confusion with THD::wait_for_prior_commit().
you're doing it in almost every method. And when one method calls another, you're doing it twice. Or thrice. Isn't is too much? (I don't know)
I added these additional waits to fix some failing replication tests. But, whit new TMP_TABLE_SHARE approach, this could have changed. I have now reverted them to the original state and will run rpl test to assure if we do not need additional waits.
That won't mean much. Temporary table handling in statement-based replication is very nasty code, and we surely do not have full test coverage of possible issues. And bugs only turn up as rare race conditions that are hard to reproduce. So you will have to make sure you understand in detail exactly where wait_for_prior_commit() (and possibly other needed code) is needed to ensure statement-based parallel replication will work in all cases (or at least work as well as the non-parallel case). The issue here is that temporary tables have no locking facilities. And the same temporary table can be used by multiple different transactions, which in parallel replication can be run in different threads. There is code that keeps track of a shared list of temporary tables, and shuffles it around between threads as needed (this is also needed for non-parallel replication, since temporary tables might need to be kept across SQL thread destruction and re-creation). See Relay_log_info::save_temporary_tables (as you probably already saw). Maybe Monty knows something about this (ISTR that he wrote that code originally), otherwise careful study of the code is probably the only way. Since there is no locking of temporary tables (at least before your patch?), it would be possible for two worker threads to simultaneously try to use the same temptable, which fails, obviously. This is handled rather crudely in the current code by simply serialising all transactions using temporary tables with all prior transactions. This is what wait_for_prior_commit() does - it waits for all transactions to commit that come before in the replication stream (within that replication domain). Hope this helps, - Kristian.