Sergei Golubchik <serg@askmonty.org> writes:
Please, find the review below!
Thanks! Answers inline, for those of your comments not mentioned below I just followed your suggestions. - Kristian.
+ entry->cookie = cookie; + + mysql_mutex_lock(&pending_checkpoint_mutex); + lsn = log_get_lsn(); + flush_lsn = log_get_flush_lsn(); + if (lsn > flush_lsn) { + /* Put the request in queue. + When the log gets flushed past the lsn, we will remove the + entry from the queue and notify the upper layer. */ + entry->lsn = lsn; + if (pending_checkpoint_list_end) { + pending_checkpoint_list_end->next = entry;
please add an assert here that
pending_checkpoint_list_end->lsn < entry->lsn
Right, but this condition can occur, and is valid. (And it is tested - I verified that running the binlog_xa_recover test will cause this assertion to trigger). Binlog checkpoint is completely asynchroneous. There is no ordering enforced, so it is possible for checkpoint N+1 to be processed before N (in practise it is though rather unlikely to occur). If this happens, then we will insert (N) after (N+1) in the list. And we will notify checkpoint (N+1) before (N). This is fine, the upper layer handles this, and checks that all prior checkpoints have been notified when it processes a notification. You can see this in binlog_xa_recover.result, under the test headed "Test that with multiple binlog checkpoints, recovery starts from the last one." master-bin.000004 # Binlog_checkpoint # # master-bin.000002 master-bin.000004 # Binlog_checkpoint # # master-bin.000004 Here DBUG_SYNC is used to make the master-bin.000004 checkpoint occur before master-bin.000003. So what happens is that first InnoDB notifies about master-bin.000004, the upper layer does nothing as master-bin.000003 is still pending. Then when master-bin.000003 is notified, upper layer sees that both 3 and 4 are ready, and logs directly master-bin.000004. I added this comment instead of the assert, hopefully it will make things a bit clearer: /* There is no need to order the entries in the list by lsn. The upper layer can accept notifications in any order, and short delays in notifications do not significantly impact performance. */
+ } else { + pending_checkpoint_list = entry; + } + pending_checkpoint_list_end = entry; + entry = NULL; + } + mysql_mutex_unlock(&pending_checkpoint_mutex); + + if (entry) { + /* We are already flushed. Notify the checkpoint immediately. */ + commit_checkpoint_notify_ha(entry->hton, entry->cookie); + my_free(entry);
I'm a bit worried, whether you're going to do a lot of small allocations. Perhaps it'd be better to use an allocator of a fixed-size objects, with a pool. To avoid going to the heap too often. The same applies to the new code log.cc.
It will not be a lot of allocations. Binlog checkpoints are only made once per binlog rotation, not per-commit or something like that. So it is ok to do a simple malloc().
@@ -3017,8 +3034,113 @@ innobase_checkpoint_request( handlerton *hton, void *cookie) {
+ entry->hton = hton;
why do you need to remember the handlerton here? It's always innodb_hton_ptr, isn't it?
Why do you even pass hton down as an argument, to be able to reuse one checkpoint request function for many engines?
There is no special reason, I agree it is probably not needed. But I added it to be consistent with commit(), prepare(), rollback(), etc. which all take the handlerton as argument. Let me know if you think I should remove entry->hton, and if I should remove hton as argument for commit_checkpoint_request(). For now, I will leave them ...
I'd prefer if both variables would start from "binlog_thread" or both from "binlog_background_thread". Either way, but the same for both variables.
Indeed, I agree.
Could you somehow decide whether the official name for it is a "binlog background thread" or simply a "binlog thread" and use it consistently everywhere? It's also important for the configuration or status
I decided on "binlog background thread", and changed everywhere to be consistent with this.
+ thd= new THD; + thd->system_thread= SYSTEM_THREAD_BINLOG_BACKGROUND; + my_pthread_setspecific_ptr(THR_THD, thd);
^^^ the normal way of setting up a THD, would be to call thd->store_globals(). Why did that not work for you and you had to resort to the manual my_pthread_setspecific_ptr() ?
Well, the reason is that I did not know what is the normal/correct way to create a thread. I think I wanted to have only the necessary stuff in there. Originally I wanted to not have even a THD (as THD has tons of stuff that is not needed in the binlog background thread), but then DEBUG_SYNC does not work. I guess this is the wrong approach, and it is better to have THD and all the usual stuff in it. I changed to use thd->store_globals() as you suggested (and then I also needed to set THD::thread_stack).
In 10.0 you need to use THD_STAGE_INFO() (here and below - everywhere instead of thd_proc_info)
Ok. I'll push it as is to 10.0-base, and then change to THD_STAGE_INFO() when I merge it to 10.0. Thanks for the notification.
/*****************************************************************//** Handle a commit checkpoint request from server layer. We simply flush the redo log immediately and do the notify call.*/
The comment seems to be wrong now, doesn't it?
Good catch, thanks! I changed to this: /*****************************************************************//** Handle a commit checkpoint request from server layer. We put the request in a queue, so that we can notify upper layer about checkpoint complete when we have flushed the redo log. If we have already flushed all relevant redo log, we notify immediately.*/ - Kristian.