Sergei Golubchik <serg@mariadb.org> writes:
As you suggested on irc, it would make sense to make a smaller innodb/xtradb only fix in 10.0 and a more engine-friendly, with the new api, in 10.1
Hmm, okay... When you put it this way, it does sound simpler. Allright, let's keep thd_report_wait_for() :)
Right. So then, the plan seems to be: 1. I remove the new calls from include/plugin.h, instead place them somewhere not part of a public API (maybe just "extern" declarations inside InnoDB/XtraDB, and whatever is needed to make it work correctly for ha_innodb.dll on Windows). 2. I try to remove the kill-in-background, instead do it directly in the thread doing thd_report_wait_for() (I think that should be possible). 3. I apply the other review comments that you sent in another mail. 4. I file a Jira task for 10.1 about a general solution, with a good API and other ideas collected so far. Other than this, the patch will be much the same as what I had initially. Is this ok with you? Or did I miss something?
It's not the expensive that worries me. The problem is that some of the following transactions may not be possible to roll back.
Ah, yes, indeed. We could still
1. rollback regardless and possibly break replication in this case. saying that a transactional engine will work without modifications in most cases, but not when it's mixed with non-trans updates
2. as discussed, have a flag to mark non-trans-updates transactions and don't run them in parallel at all. then a transactional engine will work without modifications.
but that's for 10.1, if we do innodb-only fix in 10.0, it means we aren't concerned with other engines there.
Yes, agree. Seems like a reasonable solution. I share your concerns about the current solution, and some of these ideas seem possible to solve most of the issues better, but are better suited for a next major release.
How can T2 run in parallel with T1 if they're from different groups?
T2 can run in parallel with the commit step of T1, but not with any events of T1 prior to commit. In more detail: Suppose we have 4 transactions in two group commits: (T1, T2) followed by (T3, T4). We will schedule T1, T2, T3, and T4 in parallel, each on their own thread (assuming @@slave_parallel_threads >= 4). T1 and T2 are in the same group commit, so they are allowed to start immediately. However, T3 and T4 are in a different group commit, so they are not ready to start - they might conflict with T1 or T2. So they wait. Suppose T2 reaches its COMMIT (or XID) event first. It calls mark_start_commit(), however at this point it does not do anything. T2 has commit order after T1, so it goes to wait for T1 in wait_for_prior_commit(). Now T1 reaches its COMMIT/XID event, and calls mark_start_commit(). Now both T1 and T2 have completed all their modifications, and are ready to commit. This means that we can now start running T3 and T4. T3 or T4 might have conflicting rows with T1 or T2, but T1 and T2 have already done all their modifications, so it's ok. If there is a conflict, T3 and T4 will just wait. If not, T3 and T4 can run in parallel with the commit steps of T1 and T2. Suppose both T3 and T4 have time to reach their COMMIT/XID event before T1 has time to complete commit. Then T1 can find both T2, T3, and T4 queued up for group commit. And T1 can do a single group commit for all four of them, sharing the fsync overhead among 4 transactions. This way, we get more opportunity for parallelism. This optimisation (starting T3/T4 at _start_ of T1/T2 commit, rather than after) is particularly effective when commit is expensive, eg. with --sync-binlog=1 and --innodb-flush-log-at-trx-commit=1. It allows to make effective use of group commit. It also allows to improve parallelism on slaves deeper down in the hierarchy, using --binlog-commit-wait-count. Without this, the group commit parallelism from a slave would always be less than (or equal) to that on the master. Thanks, - Kristian.