Hi!
"Kristian" == Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Kristian> Monty, Kristian> Thanks for the comments so far. Kristian> I have pushed fixes to 10.0-knielsen, and some specific replies are below. Thanks. I will review that today. Kristian> - Kristian. Kristian> Michael Widenius <monty@askmonty.org> writes:
+ --slave-parallel-threads=# + If non-zero, number of threads to spawn to apply in + parallel events on the slave that were group-committed on + the master or were logged with GTID in different + replication domains.
What happens if --slave-parallel-threads is 0 ? Does this mean that there will be no parallel threads? Is this the same as 1 ?
If same as 1, why not have the option to take values starting from 1 ?
Kristian> I think it is logical that 0 means disabled and >0 means enabled with that Kristian> many threads. But I don't mind using 1 as disabled and needing >=2 for Kristian> enabled. Though the ability to use a pool with just 1 thread might be useful Kristian> for testing. It doesn't seem critical, maybe we can revisit this if time Kristian> allows when the basic stuff is done. I come up with one case where 0 is better - In multi-source 0 would mean that we have one thread per connection. You can't get that behaviour if we change the meaning of the slave-parallel-thread variable.
=== modified file 'mysql-test/suite/perfschema/r/dml_setup_instruments.result' --- mysql-test/suite/perfschema/r/dml_setup_instruments.result 2013-03-26 09:35:34 +0000 +++ mysql-test/suite/perfschema/r/dml_setup_instruments.result 2013-08-29 08:18:22 +0000 @@ -38,14 +38,14 @@ order by name limit 10; NAME ENABLED TIMED wait/synch/cond/sql/COND_flush_thread_cache YES YES wait/synch/cond/sql/COND_manager YES YES +wait/synch/cond/sql/COND_parallel_entry YES YES +wait/synch/cond/sql/COND_prepare_ordered YES YES wait/synch/cond/sql/COND_queue_state YES YES wait/synch/cond/sql/COND_rpl_status YES YES +wait/synch/cond/sql/COND_rpl_thread YES YES +wait/synch/cond/sql/COND_rpl_thread_pool YES YES wait/synch/cond/sql/COND_server_started YES YES wait/synch/cond/sql/COND_thread_cache YES YES -wait/synch/cond/sql/COND_thread_count YES YES -wait/synch/cond/sql/Delayed_insert::cond YES YES -wait/synch/cond/sql/Delayed_insert::cond_client YES YES -wait/synch/cond/sql/Event_scheduler::COND_state YES YES
Any idea why the above was deleted? Don't see anything in your patch that could affect that.
Kristian> I believe it does a SELECT with a LIMIT - so when new stuff is added before Kristian> the limit, stuff below gets pushed out. Kristian> But if you ask why the test is that way, I cannot answer. The purpose of some Kristian> of those perfschema tests is somewhat of a mystery, they seem to serve little Kristian> other purpose than add maintenance cost for result file update whenever Kristian> something changes ... ok, I missed the 'limit part'. I can only say 'stupid test' about this... <cut>
@@ -6541,44 +6543,201 @@ MYSQL_BIN_LOG::write_transaction_to_binl }
Could you please document in detail the arguments and return value for the following function.
Kristian> I added the following comment: Kristian> /* Kristian> Put a transaction that is ready to commit in the group commit queue. Kristian> The transaction is identified by the ENTRY object passed into this function. Kristian> To facilitate group commit for the binlog, we first queue up ourselves in Kristian> this function. Then later the first thread to enter the queue waits for Kristian> the LOCK_log mutex, and commits for everyone in the queue once it gets the Kristian> lock. Any other threads in the queue just wait for the first one to finish Kristian> the commit and wake them up. This way, all transactions in the queue get Kristian> committed in a single disk operation. Kristian> The return value of this function is TRUE if queued as the first entry in Kristian> the queue (meaning this is the leader), FALSE otherwise. Kristian> The main work in this function is when the commit in one transaction has Kristian> been marked to wait for the commit of another transaction to happen Kristian> first. This is used to support in-order parallel replication, where Kristian> transactions can execute out-of-order but need to be committed in-order with Kristian> how they happened on the master. The waiting of one commit on another needs Kristian> to be integrated with the group commit queue, to ensure that the waiting Kristian> transaction can participate in the same group commit as the waited-for Kristian> transaction. Kristian> So when we put a transaction in the queue, we check if there were other Kristian> transactions already prepared to commit but just waiting for the first one Kristian> to commit. If so, we add those to the queue as well, transitively for all Kristian> waiters. Kristian> */ Kristian> bool Kristian> MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *entry) Kristian> I also moved some code from the caller into this function, which clarifies Kristian> things by keeping related code together. And I added more comments about what Kristian> is going on with the different list pointers, and who is doing what. Thanks. That will help a lot in making thins easier to understand. <cut>
I did find it confusing that here you threat next_subsequent_commit as a way to avoid recursion, but you use this list for other things in register_wait_for_prior_commit().
Kristian> I added this comment: Kristian> We keep a list of all the waiters that need to be processed in `list', Kristian> linked through the next_subsequent_commit pointer. Initially this list Kristian> contains only the entry passed into this function. Kristian> We process entries in the list one by one. The element currently being Kristian> processed is pointed to by `cur`, and the element at the end of the list Kristian> is pointed to by `last` (we do not use NULL to terminate the list). Kristian> As we process an element, it is first added to the group_commit_queue. Kristian> Then any waiters for that element are added at the end of the list, to Kristian> be processed in subsequent iterations. This continues until the list Kristian> is exhausted, with all elements ever added eventually processed. Kristian> The end result is a breath-first traversal of the tree of waiters, Kristian> re-using the next_subsequent_commit pointers in place of extra stack Kristian> space in a recursive traversal. Great. The one thing I am missing was: The temporary list created in next_subsequent_commit is not used by the caller or any other function.
+ /* Now we need to clear the wakeup_subsequent_commits_running flags. */ + if (list) + { + for (;;) + { + if (list->wakeup_subsequent_commits_running) + { + mysql_mutex_lock(&list->LOCK_wait_commit); + list->wakeup_subsequent_commits_running= false; + mysql_mutex_unlock(&list->LOCK_wait_commit);
Why do we need the mutex above? The only place where we test this variable, as far as I can find, is in wait_for_commit::register_wait_for_prior_commit()
Kristian> Yes, it looks strange. Unfortunately I cannot remember the reason 100% right Kristian> now (I wrote this code 9 months ago). I remember that there _was_ a reason. I Kristian> should have written a comment to explain this, shame on me! Kristian> I think the code basically needs to do the same as Kristian> wait_for_commit::wakeup_subsequent_commits2(), but integrated with the tree Kristian> traversal. Kristian> I agree this needs clarification and looks suspicious, but I wanted to get the Kristian> rest of the comments sent to you today, I will try to re-visit this later (and Kristian> the similar code in wakeup_subsequent_commits2()). ok. Please take a look at this when you are ready with other things. (Add at least a 'todo tag to this code). <cut>
A much easier way to achive the wait is to have the caller lock mysql_mutex_lock(&rpt->LOCK_rpl_thread) and then just release it when it's time for this thread to continue. - No need for signaling or checking if the thread got the signal - No need to have a cond wait here - No need to have a delay_start flag
Kristian> I consider this method simpler and clearer. But it is not critical code, feel Kristian> free to change it if you like. I will do this change, probably today. It will make some of the code a lot simpler. Wait and see... <cut>
Another question: do we really need rpt->stop ? Isn't it enough with using the thd->killed for this ?
Kristian> I think there is a need to generally go through and check for proper handling Kristian> of thd->killed, not just here but in all cases. I did not have time to do this Kristian> yet, and I am in any case a bit unsure about exactly how kill semantics should Kristian> work. Kristian> Now that I think about it, shouldn't they remain different? Kristian> I think thd->killed is set if user does an explicit KILL of the thread? This Kristian> should then abort and fail the currently executing transaction. But it should Kristian> not cause the thread to terminate, should it? That could leave us with no Kristian> threads at all in the replication thread pool, which would cause replication Kristian> to hang. Or should the thread exit and be re-spawned? What should happen if you kill a replication thread is that replication should stop for that master. Kristian> This needs more thought, I think ... certainly something looks not right. After looking at the full code, I think that the logical way things should work is: 'stop' is to be used when you want to nicely take done replication. This means that the current commit groups should be given time to finish. thd->killed should mean that we should stop ASAP. - All not commited things should abort. This is needed in a 'panic shutdown' (like out soon-out-of- power) or when trying to kill the replication thread when one notices that something went horribly wrong (like ALTER TABLE stopping replication).
=== modified file 'sql/sql_class.cc' --- sql/sql_class.cc 2013-06-06 15:51:28 +0000 +++ sql/sql_class.cc 2013-08-29 08:18:22 +0000
+void +wait_for_commit::wakeup() +{ + /* + We signal each waiter on their own condition and mutex (rather than using + pthread_cond_broadcast() or something like that). + + Otherwise we would need to somehow ensure that they were done + waking up before we could allow this THD to be destroyed, which would + be annoying and unnecessary. + */ + mysql_mutex_lock(&LOCK_wait_commit); + waiting_for_commit= false; + mysql_cond_signal(&COND_wait_commit); + mysql_mutex_unlock(&LOCK_wait_commit); +}
In this particular case it looks safe to move the cond signal out of the mutex. I don't see how anyone could miss the signal.
Kristian> Probably. Generally I prefer to write it as above, to not have to make sure Kristian> that moving it out is safe. I agree that this is the way to do in general. However for cases where we are waiting almost once per query, we need to make things faster as this extra wakup will take notable resources!
+ The waiter needs to lock the waitee to delete itself from the list in + unregister_wait_for_prior_commit(). Thus wakeup_subsequent_commits() can not + hold its own lock while locking waiters, lest we deadlock.
lest ?
Kristian> https://en.wiktionary.org/wiki/lest Kristian> But I've changed it to use a more common English wording "as this could lead Kristian> to deadlock". Thanks. It's easier to read the code when you don't need to consult a dictionary...
+ The reason for separate register and wait calls is that this allows to + register the wait early, at a point where the waited-for THD is known to + exist. And then the actual wait can be done much later, where the + waited-for THD may have been long gone. By registering early, the waitee + can signal before disappearing. +*/
Instead of taking locks to ensure that THD will disapper, why not keep all THD objects around until all transactions in a group has been executed?
Kristian> I think it is not just about THD disappearing. IIRC, the more important Kristian> problem is that the THD may have moved on to do something different. So even Kristian> if the data in the other THD is not freed, it may still contain completely Kristian> unrelated data (and thus wrong data, for the accessing thread). Kristian> But this is actually something I consider a very important thing, and one that Kristian> I spent quite some effort on and was quite pleased to get in the current Kristian> state. If arbitrary THD can have pointers into arbitrary other THD, the Kristian> life-time issues can quickly get very nasty, and it is important to handle Kristian> this well to avoid having to understand the entire codebase to avoid subtle bugs. ok. Thanks for the explanation. Regards, Monty