[Maria-developers] How handle_rpl_parallel_thread rollback partial transaction when STOP SLAVE In parallel replication
hi Kristian, I have a quesion about parallel replication. In handle_rpl_parallel_thread(), a worker thread has got a whole transaction events, but it only apply partial events. this thread wouldl skip left events when do STOP SLAVE(is that rigth?) ,because sql_worker_killed() return true. but it seems that partial transaction won't be rollbacked for wait_for_prior_commit() always return false. do me wrong? or how would it rollback that partial transaction. 2014-02-25 nanyi607rao
"nanyi607rao" <nanyi607rao@gmail.com> writes:
hi Kristian,
Hi nanyi607rao,
In handle_rpl_parallel_thread(), a worker thread has got a whole transaction events, but it only apply partial events. this thread wouldl skip left events when do STOP SLAVE(is that rigth?) ,because sql_worker_killed() return true. but it seems that partial transaction won't be rollbacked for wait_for_prior_commit() always return false. do me wrong? or how would it rollback that partial transaction.
Yes, you are right. This is a bug in the current code. I am actually at the moment working on a fix for this problem (and a number of other similar problems related to normal stop or error stop). I am sorry that I didn't manage to fix it before you hit it in your work. The idea to fix this isas follows: - We record which transactions have started to commit - When we do STOP SLAVE we remember the transaction that last started to commit at the point at which we stopped. - In handle_rpl_parallel_thread(), we only start skipping events from transactions that start strictly _after_ the stop point. Prior transactions have no events skipped. I put my current patch here: http://knielsen-hq.org/parallel_replication_patch_intermediate.diff You can take a look if you want to see in more details what I am doing. I believe this patch fixes the particular bug you mentioned. But the patch is not complete yet, there is at least one known bug related to error stop still, and it includes extra debug fprintf() statements and such. So you can also just wait a few days for me to finish the patch, if you prefer, I will let you know when I have something that is ready. The new code should be a lot clearer and a lot more robust. But it sounds like you are working on some extensions to the parallel replication? In that case, my changes may cause you some more work to adapt it to the new code, sorry for that. Please feel free to ask any further questions you have, and I will try to answer as well and as quickly as I can. - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
The idea to fix this isas follows:
- We record which transactions have started to commit
- When we do STOP SLAVE we remember the transaction that last started to commit at the point at which we stopped.
- In handle_rpl_parallel_thread(), we only start skipping events from transactions that start strictly _after_ the stop point. Prior transactions have no events skipped.
I put my current patch here:
http://knielsen-hq.org/parallel_replication_patch_intermediate.diff
Hm, that patch is not sufficient, the bug is still there :-( Sorry about that. At least I did say that the patch was incomplete. I will fix it properly and let you know when I have something new. - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
I put my current patch here:
http://knielsen-hq.org/parallel_replication_patch_intermediate.diff
Hm, that patch is not sufficient, the bug is still there :-( Sorry about that. At least I did say that the patch was incomplete.
I will fix it properly and let you know when I have something new.
Ok, I have now pushed my latest patch to 10.0-base and 10.0. This patch should be complete, it fixes all the bugs I know about, including the one you mentioned. Please let me know if you have any further questions or issues, and I will try to help. - Kristian.
participants (2)
-
Kristian Nielsen
-
nanyi607rao