Paul McCullagh <paul.mccullagh@primebase.org> writes:
The easiest way to do this would be to add a parameter to xn_end_xact() that indicates that the log should not be written or flushed.
Ok, I gave it a shot, but I had some problems due to not knowing the PBXT code sufficiently ...
In xn_end_xact(), the last parameter to the call to xt_xlog_log_data() determines what should happen:
#define XT_XLOG_NO_WRITE_NO_FLUSH 0 #define XT_XLOG_WRITE_AND_FLUSH 1 #define XT_XLOG_WRITE_AND_NO_FLUSH 2
Without write or flush, this is a very fast operation. But the transaction is still committed and ordered, it is just not durable.
I notice that xs_end_xact() does a number of things. I am wondering if all of these should be in the "fast" part in commit_ordered(), or if some should be done in the "slow" part along with the log flush? In particular this, flushing the data log (is this flush to disk?): if (!thread->st_dlog_buf.dlb_flush_log(TRUE, thread)) { ok = FALSE; status = XT_LOG_ENT_ABORT; } and this, at the end concerning the "sweeper": if (db->db_sw_faster) xt_wakeup_sweeper(db); /* Don't get too far ahead of the sweeper! */ if (writer) { ... Can you help suggest if these should be done in the "fast" part, or in the "slow" part? Also, this statement definitely needs to be postponed to the "slow" part I guess: thread->st_xact_data = NULL;
Then when actual commit is called, we check the current log flush position against the flush position we need. If it is passed our position then this is a NOP.
I think I can do this with a condition like this: if (xt_comp_log_pos(self->commit_fastpart_log_id, self->commit_fastpart_log_offset, xl_flush_log_id, xl_flush_log_offset) <= 0) But I am wondering if I need to take any locks around reading xl_flush_log_id and xl_flush_log_offset? Or can one argue that a dirty read could be ok (as long as it's atomic) as the values are probably monotonic?
If not, then we need to call xlog_append() with no data. This will do a group commit on the log.
Is it safe to call xlog_append() with no data even if the log has been flushed past the current position already? (else some locking seems definitely needed).
I was a bit difficult to explain, so please ask if anything is not clear.
Hopefully you can help with some of the above points, then I can give it another go with fresh eyes and maybe show you a patch. (If I get to that point, I will probably also need some advice on the proper error handling)... Anyway, from what you wrote and from what I see in the code, it seems the API I propose is general enough to fit well with PBXT, which is good and what I wanted to check (Even if xn_end_xact() may need to be taken apart a bit to properly split into a "fast" and a "slow" part). - Kristian.