Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Ok, thanks a lot for the advice, I will give it another shot.
Thanks to your help, I got it working! It was _really_ nice to see that the new API applies well to PBXT also. As a bonus, we now get START TRANSACTION WITH CONSISTENT SNAPSHOT actually be consistent! In MySQL, this does not really do much except start a transaction in all engines, it certainly does not ensure any consistency between engines. With this change, it becomes consistent, I added a small Perl test program tests/consistent_snapshot.pl that shows this. I think this is particularly useful for backups; I plan to add a way to get the corresponding binlog position, so START TRANSACTION WITH CONSISTENT SNAPSHOT can be used to make a fully consistent and non-blocking backup (current mysqldump needs FLUSH TABLES WITH READ LOCK, which is not really non-blocking). I hope you can take a look at the patch (attached) when you get some time and let me know what you think, and if you see any mistakes. I did it a little differently from what we discussed, as I wanted to minimise the amount of work done while holding the global mutex around commit_ordered(). I also pushed the patch here, in case you want to see or run the full code: lp:~maria-captains/maria/mariadb-5.1-mwl116-pbxt It passes the test suite, but I did at one point see this in the log, which I am not sure what means, maybe you can help? void XTTabCache::xt_tc_release_page(XTOpenFile*, XTTabCachePage*, XTThread*)(tabcache_xt.cc:409) page->tcp_lock_count > 0 Finally a couple of questions:
In particular this, flushing the data log (is this flush to disk?):
if (!thread->st_dlog_buf.dlb_flush_log(TRUE, thread)) { ok = FALSE; status = XT_LOG_ENT_ABORT; }
Yes, this is a flush to disk.
This could be done in the slow part (obviously this would be ideal).
If we do not flush the data log, then there is a chance that such a commit transaction is incomplete, because the associated data log data has not been committed.
This is done in commit, but I could not see where similar data log flush is done in prepare(). It seems prepare() mostly adds a "prepare" record and flushes the transaction log. Is it correct that no data log flush happens in prepare? If so, don't we have the same problem? Suppose we prepare() in PBXT and write (and flush) the transaction into the binary log. Then we crash. When the server comes back up, it will try to recover the transaction inside PBXT, but that will not be possible if the data log was lost due to no flush, right? Final question: In commit() we call xt_tab_restrict_rows(). It seems to be delayed checking for defered foreign key constraints or something like that? If it is, then shouldn't it be done in prepare() (it's wrong to rollback with error in commit() after successful prepare)? I see the #ifdef XT_IMPLEMENT_NO_ACTION around the call, so I suppose this code is not actually used, but I just wondered ... - Kristian.