Daniel Black <notifications@github.com> writes: Thanks for your patch. I agree that these kinds of status information could be useful. But I have some parts that are not clear, and I want you to think about the questions below and answer with your thoughts. This is to make sure you understand yourself the issue in detail; to help me understand it; and in some cases maybe some changes should be made.
The following global status variables where added: * binlog_group_commit_reason_count * binlog_group_commit_reason_usec
So, these names seem a bit odd. Neither "count" nor "usec" makes much sense as a reason for something to happen. "_timeout" would be a better reason that "_usec", but I am not sure of a better name for "_count". The good thing about these names of course is that they match the corresponding --binlog-commit-wait-* option. What do you think of something like binlog_commit_trigger_count binlog_commit_trigger_timeout instead ?
* binlog_group_commit_reason_transaction * binlog_group_commit_reason_immediate
binlog_group_commit_reason_transaction is a result of ordered transaction that need to occur in the same order on the slave and can't be parallelised.
binlog_group_commit_reason_immediate is caused to prevent stalls with row locks as described in log.cc:binlog_report_wait_for. This immediate count is also counted a second time in binlog_group_commit_reason_transaction.
Please re-read that explanation, and consider: do you think an average user/DBA will fully understand what these status variables mean? Honestly, I am wondering if you can even yourself explain exactly what the difference is between the two :-). Can you give two simple examples of transactions, one example of which will increment the _transaction status variable and one of which will increment the _immediate? Do you think the distinction between the two is useful for the user? If not, maybe just have a single status variable, like binlog_commit_trigger_lock_wait for example?
Overall binlog_group_commits = binlog_group_commit_reason_count + binlog_group_commit_reason_usec + binlog_group_commit_reason_transaction
Is this also true if --binlog-commit-wait-count=0? If so, what will the values of the different status variables be in this case? Now, to the patches.
+ { + if (++count >= opt_binlog_commit_wait_count) + { + group_commit_reason_count++; + return; + } + if (unlikely(e->thd->has_waiter)) + { + group_commit_reason_transaction++; return; + } + }
So this increments the global status variables directly, I think? How does the thread locking work? Is it possible for a reader to see a corrupt value (eg. one word of the old value and one word of the new) on 32-bit platform? How is this handled for other similar status variables? It would not be good if we had to take some LOCK_status or something inside the critical group commit code. With respect to the test cases: Can you please comment yourself on the different places where you added output of the status variables, and explain why this output will always be the same, no matter how threads are scheduled on the machine running the test? (This is the critical part of most tests related to replication; because of the inherently multi-threaded nature of such tests, a lot of care is needed to make sure the test will not produce different output on different test runs depending on the speed of the host or how threads are scheduled on a loaded machine). - Kristian.