Hi Monty, Thanks for your review! I generally followed your suggestions, except for this:
+ if (entry->begin_event->write(&log_file)) + return ER_ERROR_ON_WRITE; + + DBUG_EXECUTE_IF("crash_before_writing_xid", + { + if ((write_cache(cache))) + DBUG_PRINT("info", ("error writing binlog cache")); + else + flush_and_sync(); + + DBUG_PRINT("info", ("crashing before writing xid")); + abort(); + }); + + if (write_cache(cache)) + return ER_ERROR_ON_WRITE; + + if (entry->end_event->write(&log_file)) + return ER_ERROR_ON_WRITE; + + if (entry->incident_event && entry->incident_event->write(&log_file)) + return ER_ERROR_ON_WRITE; + + if (cache->error) // Error on read + return ER_ERROR_ON_READ;
It would be better to test this after 'write_cache'. There is no point in writing anymore if this fails....
Are you sure? Couldn't this result in a half-written event group (missing COMMIT or incident event) in cases where it wouldn't happen before? Anyway, this is unchanged from the existing code, I am reluctant to change it as part of the group commit patch ... ---- Otherwise I fixed everything you pointed out. Below a few clarifying comments in case you are interested.
+ trx_data->using_xa= TRUE; + if (xid)
The above looks strange. How can we have xa without an xid ?
After some investigation, I found this can happen in XA COMMIT ... ONE PHASE. I added a comment. It was actually a bug, in this case of XA COMMIT ... ONE PHASE it should use a COMMIT query event (not NULL). Fixed; this also removes the case of end_event==NULL that you commented on in other parts of the code.
/* + We only bother to write to the binary log if there is anything + to write. */ + if (my_b_tell(cache) > 0) + {
Can't we ignore transactions with cache == 0 on the upper level ? There is no reason I can come up with why we put these in the queue.
It can happen for example for this: SET sql_log_bin = 0; BEGIN; INSERT INTO innodb_table VALUES (3); INSERT INTO pbxt_table VALUES (4); COMMIT; In this case, there is nothing to write to the binlog. However, the code is still invoked, as binlog needs to act as a transaction coordinator for 2-phase commit between the two storage engines. (This is btw. unchanged in the patch compared to the original code).
The above is better written as:
status_var_increment(thd->status_var.ha_prepare_count); if ((err= ht->prepare(ht, thd, all))) { my_error(ER_ERROR_DURING_COMMIT, MYF(0), err); goto err; }
I did it like this, to avoid swapping status_var_increment() and ht->prepare(): err= ht->prepare(ht, thd, all) status_var_increment(thd->status_var.ha_prepare_count); if (err) { my_error(ER_ERROR_DURING_COMMIT, MYF(0), err); goto err; }
+# Check that the binlog position reported by InnoDB is the correct one +# for the end of the second transaction (as can be checked with +# mysqlbinlog). +let $MYSQLD_DATADIR= `SELECT @@datadir`; +--exec grep 'InnoDB: Last MySQL binlog file position' $MYSQLD_DATADIR/../../log/mysqld.1.err | tail -1
Is there another way to get the position without using grep ?
Another option would be to store the startup position in some user variables (assuming this would be useful for anyone else).
Yes, I think this is a good idea. I added a worklog for it: MWL#191.
(as grep probably don't work on windows)
As we already have regexp in mysqltest, it should be trivial to do our own simple version of grep there. Let's see if we can get someone to implement the grep.
Actually there should be several ways to get `grep` and `tail` on Windows, including at least one made by Microsoft and several that are Free Software. I would personally prefer to not re-invent the wheel, but rely on existing packages for this. We could have a `--source include/have_posix_tools' so we do not fail the test if the posix commands are not available. Anyway, in this case, the MWL#191 is the best solution I think. I added to the worklog to fix the test to not use grep, and until then I will just put `--source include/not_windows.inc' with a reference to MWL#191.
+# Actually the output from this currently shows a bug. +# The injected IO error leaves partially written transactions in the binlog in +# the form of stray "BEGIN" events. +# These should disappear from the output if binlog error handling is improved.
Do you have a worklog number for the above. If yes, please add it to the source too.
Done (it is MySQL Bug#37148 and WL#1790).
+int +MYSQL_BIN_LOG::write_transaction(group_commit_entry *entry) +{ + binlog_trx_data *trx_data= entry->trx_data; + IO_CACHE *cache= &trx_data->trans_log; + /* + Log "BEGIN" at the beginning of every transaction. Here, a transaction is + either a BEGIN..COMMIT block or a single statement in autocommit mode. The + event was constructed in write_transaction_to_binlog(), in the thread + running the transaction. + + Now this Query_log_event has artificial log_pos 0. It must be + adjusted to reflect the real position in the log. Not doing it + would confuse the slave: it would prevent this one from + knowing where he is in the master's binlog, which would result + in wrong positions being shown to the user, MASTER_POS_WAIT + undue waiting etc. + */
The above old comment doesn't make sense anymore in this place of the code. Please remove or move it to a better place. (I tried to find a better place, but couldn't find it)
Moved the first part to write_transaction_to_binlog(); the last part I deleted as there is a similar better comment in write_cache(). - Kristian.