"nanyi607rao" <nanyi607rao@gmail.com> writes:
hi Kristian,
Hi nanyi607rao,
In handle_rpl_parallel_thread(), a worker thread has got a whole transaction events, but it only apply partial events. this thread wouldl skip left events when do STOP SLAVE(is that rigth?) ,because sql_worker_killed() return true. but it seems that partial transaction won't be rollbacked for wait_for_prior_commit() always return false. do me wrong? or how would it rollback that partial transaction.
Yes, you are right. This is a bug in the current code. I am actually at the moment working on a fix for this problem (and a number of other similar problems related to normal stop or error stop). I am sorry that I didn't manage to fix it before you hit it in your work. The idea to fix this isas follows: - We record which transactions have started to commit - When we do STOP SLAVE we remember the transaction that last started to commit at the point at which we stopped. - In handle_rpl_parallel_thread(), we only start skipping events from transactions that start strictly _after_ the stop point. Prior transactions have no events skipped. I put my current patch here: http://knielsen-hq.org/parallel_replication_patch_intermediate.diff You can take a look if you want to see in more details what I am doing. I believe this patch fixes the particular bug you mentioned. But the patch is not complete yet, there is at least one known bug related to error stop still, and it includes extra debug fprintf() statements and such. So you can also just wait a few days for me to finish the patch, if you prefer, I will let you know when I have something that is ready. The new code should be a lot clearer and a lot more robust. But it sounds like you are working on some extensions to the parallel replication? In that case, my changes may cause you some more work to adapt it to the new code, sorry for that. Please feel free to ask any further questions you have, and I will try to answer as well and as quickly as I can. - Kristian.