Hi!
"Pavel" == Pavel Ivanov <pivanof@google.com> writes:
Pavel> Here is a reproduction test case. I took the vanilla tarball of Pavel> 10.0.8, applied to it the following patch: Pavel> @@ -131,6 +131,11 @@ bool trans_begin(THD *thd, uint flags) Pavel> DBUG_ASSERT(!thd->locked_tables_mode); Pavel> +#ifdef HAVE_REPLICATION Pavel> + if (thd->slave_thread && (thd->variables.option_bits & OPTION_BEGIN)) Pavel> + abort(); Pavel> +#endif Pavel> + Pavel> if (thd->in_multi_stmt_transaction_mode() || Pavel> (thd->variables.option_bits & OPTION_TABLE_LOCK)) Pavel> { Pavel> Then I compiled and ran the following test: Pavel> --source include/master-slave.inc Pavel> connection master; Pavel> create table t (n int); Pavel> insert into t values (1); Pavel> show binlog events; Pavel> sync_slave_with_master; Pavel> That test had this output: Pavel> include/master-slave.inc Pavel> [connection master] Pavel> create table t (n int); Pavel> insert into t values (1); Pavel> show binlog events; Pavel> Log_name Pos Event_type Server_id End_log_pos Info Pavel> master-bin.000001 4 Format_desc 1 248 Server ver: Pavel> 10.0.8-MariaDB-debug-log, Binlog ver: 4 Pavel> master-bin.000001 248 Gtid_list 1 273 [] Pavel> master-bin.000001 273 Binlog_checkpoint 1 313 master-bin.000001 Pavel> master-bin.000001 313 Gtid 1 351 GTID 0-1-1 Pavel> master-bin.000001 351 Query 1 436 use `test`; create table t (n int) Pavel> master-bin.000001 436 Gtid 1 474 BEGIN GTID 0-1-2 Pavel> master-bin.000001 474 Query 1 561 use `test`; insert into t values (1) Pavel> master-bin.000001 561 Query 1 630 COMMIT Pavel> And then it said that slave died with the stack trace Pavel> sql/transaction.cc:139(trans_begin(THD*, unsigned int))[0x788e20] Pavel> sql/log_event.cc:6478(Gtid_log_event::do_apply_event(rpl_group_info*))[0x93a685] Pavel> sql/log_event.h:1341(Log_event::apply_event(rpl_group_info*))[0x5ca108] Pavel> sql/slave.cc:3191(apply_event_and_update_pos(Log_event*, THD*, Pavel> rpl_group_info*, rpl_parallel_thread*))[0x5c0da8] Pavel> sql/slave.cc:3464(exec_relay_log_event)[0x5c1498] Pavel> sql/slave.cc:4516(handle_slave_sql)[0x5c44e9] Pavel> Which means that slave tries to execute BEGIN event while OPTION_BEGIN Pavel> is set which shouldn't ever happen. The assert you put in the code doesn't show that anything is wrong. The reason is the following code in log_event.cc: thd->variables.option_bits|= OPTION_BEGIN | OPTION_GTID_BEGIN; DBUG_PRINT("info", ("Set OPTION_GTID_BEGIN")); trans_begin(thd, 0); In other words, we do set OPTION_BEGIN just before calling trans_begin(), so the assert is wrong. Pavel> And to answer all of your other questions, our main concern is simple: Pavel> master and slave should always have absolutely the same database Pavel> contents, absolutely the same tables and absolutely the same data in Pavel> those tables. Correct. Pavel> Any difference in those can be created only by humans Pavel> and must be resolved only by humans. Absolutely no magic please, it's Pavel> unacceptable, whenever inconsistency is detected replication must stop Pavel> and wait for human intervention. It's not enough to have the same data Pavel> eventually. And if any DBA requests a different behavior he doesn't Pavel> understand what kind of troubles waits him in the future. The problem you are not taking into account for is that while executing CREATE TABLE or DROP TABLE, the slave is inconsistent until we write things to the binary log and there was no before no automatic way to resolve this without user intervention. The new code I added will not make the slave less reliable in any relevant scenario I can think off. If there is an inconsistency between slave and master, it will be detected when data is applied. Pavel> As a consequence to that slave shouldn't execute any implicit commits, Pavel> because it's impossible to generate binlogs on master that will Pavel> require implicit commits. The original code that only tested OPTION_BEGIN did generate implicit commits in a lot of cases. That is why I introduced OPTION_GTID_BEGIN, just to avoid implicit commits. Pavel> Another consequence is CREATE TABLE Pavel> statement should never automatically delete the table if it already Pavel> exists. Who knows how the existing table was created and how important Pavel> the data that is stored in it? This scenario doesn't really matter for a slave. One should never create a table on the slave that does not exist on the master. The scenarios when this makes sense is very very rare and in this case on can set slave_ddl_exec_mode to IGNORE to handle this. What we do know when we see a CREATE TABLE in the binary log is that the table did not exist on the master and we should create such an empty table slave. The most likely scenario for this to happen is that the slave died while a CREATE TABLE was executed and before the binary log was written. The new code is done exactly to cope with this scenario. Pavel> Definitely not MariaDB. These questions Pavel> should be answered by human and human should decide whether it's ok to Pavel> delete existing table. Again for the same reason DROP TABLE should Pavel> never be silently ignored if the table doesn't exist -- who knows what Pavel> happened and why it doesn't exist when it did exist on master? That Pavel> should be investigated by human. The most likely reason for that table not existing is again that the slave died in the middle of a drop. The problem with the old way of handling CREATE & DROP is that if you do a lot of CREATE and DROP tables (which a lot of applications are actually doing) there is a very big change that a slave will not be able to recover from a crash. The new code is safe for all practical purposes. Any inconsistencies in data, which is not likely to occur of the new behavior with CREATE or DROP TABLE will be detected during execution of DML's. Pavel> Of course world is not perfect. If slave can crash in the middle of Pavel> CREATE TABLE and not rollback the table creation on restart, that's a Pavel> problem. But MariaDB should not assume that if table is exists then Pavel> it's there because of a crash, there could be other reasons. If slave Pavel> can crash while executing DROP TABLE and not rollback that on restart, Pavel> that's a problem too, but again it must be resolved by human (or by Pavel> code that does a proper rollback). There is no possiblity to rollback CREATE or DROP in the current code. The only possiblity we have now is to allow these DDL's to succeed. I personally think this is the safest way to go forwards, as we in this case do know that for this table we will be consistent between master and slave after the execution of the statement. This is not true for the normal slave_exec_mode, which is why this is not on by default. Pavel> And as you rightfully noted temp tables can behave weirdly with Pavel> replication, that's why we have code to prohibit creation of temp Pavel> tables on masters. This may be ok for Google, but it's not ok for most other MariaDB users. Pavel> CREATE IF NOT EXISTS can result in different data Pavel> on master and slave, that's why we prohibit execution of such Pavel> statements (as well as DROP IF EXISTS). And for any other feature that Pavel> may misbehave in replication we will put some blocks in place to avoid Pavel> any breakage. Agree that CREATE IF NOT EXIST can in theory give different data between master and slave. This is however not the case with DROP IF EXISTS. Pavel> So that's our main concern and our main expectation of how MariaDB Pavel> should behave. And we would really appreciate if that behavior didn't Pavel> silently change to break the "no magic by default" expectation. The change we did was not silently done. It was documented and blogged in several places. I added the slave_ddl_exec_mode variable to ensure that companies like yours should be able to change the behavior to the old one if they would have strange master/slave setups where it's ok that the data is different between master and slave. For normal people that needs something that will 'just work' and do not know what to do if the slave suddenly would fail to restart, the current defaults are better as behavior for CREATE or DROP will never cause inconsistencies between master and slave and will make the slave more reliable on restarts. Regards, Monty