Martin Kaluznik <martin.kaluznik@gmail.com> writes:
good to do now. And I would also like to ask you for review of existing code.
Github link: https://github.com/f4rnham/server/tree/MDEV-7502 Diff: https://github.com/MariaDB/server/compare/10.1...f4rnham:MDEV-7502.diff
Please find my review inline in patch, below. General stuff:
I have created new log event as way to signalize slave successful end of provisioning, it is last event IO thread writes to relay log before termination, and last event SQL thread reads from relay log. Test cases are using Slave_SQL_Running value from SHOW SLAVE STATUS command to wait until provisioning is completed. It works there, but I am not sure about its convinience in real world usage.
Hm, I am not sure either. Another option would be for LOAD DATA FROM MASTER to block until provisioning is done (failed or successful). Other than review, please check Buildbot results carefully for any failures. Pay particular attention to the valgrind build (for memory leaks). Also pay special attention to random failures in your new test cases that happen only occasionally. Such sporadic / random failures often turn up in replication tests. You need in test cases that use debug_dbug: --source include/have_debug.inc What is the largest data set you tested with? Please do a (manual) test with a large data set. For example, you could run sysbench with a 10GB or 50GB database, let it run for say 10 minutes, and 5 minutes in you start a LOAD DATA FROM MASTER on a slave off the server running the sysbench. Please check comments below on how (and if) the whole approach with provisioning in parallel with active transactions is correct and makes the slave always identical with the master at the end. Is there any issue with foreign keys? An insert to one table could fail if a foreign key constraint against another table is not yet satisfied due to rows not yet provisioned. A CREATE TABLE referencing another table might fail. Maybe look at if mysqldump is doing something to avoid these kinds of issues. A minimal solution is to investigate possible problems, and document them as limitations. Eg. it is reasonable to document that users must disable foreign keys during provisioning, if necessary. What happens if the server that does LOAD DATA FROM MASTER provisioning has --log-bin and --log-slave-updates? What happens if a third-level slave is connected to the server that does LOAD DATA FROM MASTER? I think this needs to be tested. Comments on the patch:
+BOOLEAN _db_keyword_locked_(CODE_STATE *cs, const char *keyword, int strict)
Can you explain the race that results without this locking? Is it because _db_keyword_() is scanning the list without any locking, so if the list changes during the scan we can crash?
+#define DBUG_EXECUTE_LOCKED(keyword,a1) \ + do {if (_db_keyword_locked_(0, (keyword), 0)) { a1 }} while(0)
You need empty definition for non-debug build (but I guess you already saw in buildbot run).
+--let $slave_param= Slave_SQL_Running +--let $slave_param_value= No +--let $slave_error_param= Last_SQL_Errno +--source include/wait_for_slave_param.inc
Is there a reason not just to use include/wait_for_slave_sql_to_stop.inc here?
+# Ensure that master and slave log names and positions are equal +--connection master +--let $binlog_file_master= query_get_value(SHOW MASTER STATUS, File, 1) +--let $binlog_pos_master= query_get_value(SHOW MASTER STATUS, Position, 1) +--connection slave +--let $binlog_file_slave= query_get_value(SHOW SLAVE STATUS, Master_Log_File, 1) +--let $binlog_pos_slave= query_get_value(SHOW SLAVE STATUS, Read_Master_Log_Pos, 1)
Is there a reason you do not use GTIDs here instead (or in addition)?
+# echo Binlog file name master - $binlog_file_master; +# echo Binlog file name slave - $binlog_file_slave; +# echo Binlog position master - $binlog_pos_master; +# echo Binlog position slave - $binlog_pos_slave;
Please remove this left-over debugging code.
+++ b/mysql-test/suite/rpl/t/rpl_provisioning_base.inc
+eval CREATE TABLE test.test_1 (a int, PRIMARY KEY (a)) ENGINE=$engine_type; +eval CREATE TABLE test.test_2 (a int, b int, c int, PRIMARY KEY (a)) ENGINE=$engine_type;
Try adding also some foreign key contraints to the table for the test.
+# Start provisioning +--connection slave +LOAD DATA FROM MASTER;
At this point, master has full binlogs. So just START SLAVE would do the same thing, just by re-playing the binlog. I think the test should use FLUSH LOGS on the master a couple of times during the initial setup of data, and then PURGE BINARY LOGS to delete some (but not all) of that binlog data, to test that the slave can still provision without complete binary logs.
+SET GLOBAL debug_dbug="+d,provisioning_wait";
+SET GLOBAL debug_dbug="-d,provisioning_wait";
Earlier, when I did this, I ended up with tons of debug output in the error log after setting the "-d,..." which is annoying. Setting an empty "+d" option was enabling debug trace. Maybe this is now fixed, if so then all the better. But if you get this problem (with stray debug trace in error log), you can use the following pattern instead: SET @old_dbug= @@GLOBAL.debug_dbug; ... SET GLOBAL debug_dbug= @old_dbug;
+# +# Attempt to start provisioning while not in GTID mode +#
Is this an old comment? It seems none of the tests really use any GTID. Does provisioning work in GTID mode? It should be tested.
+# +# Attempt to start provisioning while SQL thread is running +#
What if IO thread is running? Should probably be tested also.
+# Creation of database `test` depends purely on timing and it has nothing to do with this test case +--disable_warnings +DROP DATABASE IF EXISTS test;
What is the purpose of this? It sounds like there is a potential race in the test case? Maybe there is a better way to handle it.
+START SLAVE; + +# Cleanup after test case +--connection slave +STOP SLAVE; +RESET SLAVE; +# Creation of database `test` depends purely on timing and it has nothing to do with this test case
It is better to use the include/start_slave.inc and include/stop_slave.inc. They prevent some races, which might solve some of the problems that your comment about timing seems to hint at. Also, it is usually best to synchronise the slave with the master before stopping, to avoid it being stopped at a random point.
+# +# Bug: Deadlock occurs when IO thread fails with error before dump request +#
What do you mean? That the current patch has a bug triggered by this test case? Or that there was a bug, fixed now, and this is a test for it? Please explain better in the comment.
diff --git a/mysql-test/suite/rpl/t/rpl_provisioning_stress.test b/mysql-test/suite/rpl/t/rpl_provisioning_stress.test new file mode 100644 index 0000000..8141015 --- /dev/null +++ b/mysql-test/suite/rpl/t/rpl_provisioning_stress.test @@ -0,0 +1,48 @@ +--skip Test requires manual setup + +# Stress test for MDEV-7502 Automatic provisioning of slave +# Using Random Query Generator v2.2.0 release, downloadable from https://launchpad.net/randgen/+download
I think it is great that you made this kind of test. But maybe it would be better to put it somewhere else, to not have a test in the normal test suite that is always unconditionally skipped. Eg. it could be put in /tests/, and one would copy it into the test suite, instead of removing the --skip.
+void Provisioning_done_log_event::print(FILE* file,
+ print_header(&cache, print_event_info, FALSE); + my_b_write_string(&cache, "\tProvisioning done\n");
Hm, this can occur in the relay log, but not in the binlog, right? I think it is ok not to have a test of this in mysql-test-run, we generally can only test the "short-form" in there to avoid dependency on time and such. Just be sure to check manually that the output is correct.
@@ -9718,23 +9760,29 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi)
+ // In provisioning mode, silently ignore table not found error + if (!rgi->thd->slave_thread || !rli->mi->provisioning_mode || + !provisioning_error_code(actual_error))
Better use thd than rgi->thd here (just for consistency).
+ // Ignore certain errors during provisioning
What thoughts did you have on whether just ignoring these kinds of errors would always result in the correct result of provisioning? I suppose since we are copying data fresh from the master, it should not be possible that we miss a "real" error in this way. Will things work correctly, if we have multiple rows in a single Rows_log_event, and some of them succeed while others fail? In both MyISAM and InnoDB? Can an argument be made to prove that, if we ignore an insert or update here, we are sure to later write the correct data as part of provisioning? For both InnoDB and MyISAM? Similarly, can it be proven that it is impossible to ignore a delete and still later wrongly provision the deleted row? I suppose it would be something with binlog ordering and isolation modes for the InnoDB case - and some table locking semantics wrt. MyISAM?
+ + /* + Semaphore used by LOAD DATA FROM MASTER command for test cases, its role is + to ensure, that slave's IO thread is connected to master before test case + can continue.
+ volatile bool dump_requested_semaphore;
Shouldn't this be under #ifndef DBUG_OFF ?
diff --git a/sql/rpl_provisioning.cc b/sql/rpl_provisioning.cc
+void quote_name(char const *name, char *buff)
Try to use append_identifier() instead. Or maybe my_snprintf("%`s"). I notice that append_identifier() handles a configuration where "" is used to quote identifiers instead of ``, you should probably test if this causes a bug in your code, btw.
+provisioning_send_info::provisioning_send_info(THD *thd_arg) + : thd(thd_arg) + , connection(new Ed_connection(thd)) + , row_batch_end(NULL) + , row_buffer(my_malloc(ROW_BUFFER_DEFAULT_SIZE, MYF(0))) + , row_buffer_size(ROW_BUFFER_DEFAULT_SIZE) + , phase(PROV_PHASE_INIT) + , error(0) + , error_text(NULL) + , event_conversion_cache(NULL) +{ +}
So this constructor will probably cause the server to crash in case of out-of-memory, right? Usually, code tries to fail gracefully with ER_OUT_OF_RESOURCES / ER_OUTOFMEMORY if malloc() fails (though its not well tested, probably).
+void provisioning_send_info::close_tables() +{ + trans_commit_stmt(thd);
I see no call to commit the entire transaction (as opposed to just the statement here). This may be intentional (autocommit?), in any case please add a comment here explaining how the transaction is committed wrt. statement-transaction and full transaction. Maybe add a test where the server is running with auto-commit globally disabled? Though I am not sure if the short mysql-test-run.pl tests would notice it if everything was done as one long transaction (though large production sites undobtedly would ;-).
+bool provisioning_send_info::send_query_log_event(DYNAMIC_STRING const *query,
Is there a particular reason that you chose DYNAMIC_STRING over the String class? I think String is more widely used for building queries and such? A common use pattern with String is to allocate a reasonably-sized buffer on the stack and initialise a String object from it. Then in the common case of smallish result, no my_malloc() calls are needed.
+ if (cs_info) + { + // thd->variables.character_set_client->number + int2store(evt.charset, cs_info->cs_client); + // thd->variables.collation_connection->number + int2store(evt.charset + 2, cs_info->cl_connection); + // cs_info->cl_server + int2store(evt.charset + 4, thd->variables.collation_server->number); + // thd->variables.collation_database->number + evt.charset_database_number= cs_info->cl_db;
Can you explain why you need to consider the charsets/collations of the connection and client? Could you not just always use the system character set for generating queries?
+ if (my_strcasecmp(system_charset_info, "VIEW", type->str) == 0) + { + char name_buff[NAME_LEN * 2 + 3]; + DYNAMIC_STRING *str= (DYNAMIC_STRING*)my_malloc(sizeof(DYNAMIC_STRING), + MYF(0)); + + quote_name(database->str, name_buff); + init_dynamic_string(str, name_buff, + // Both lengths + dot and quotes, will be reallocated + // only in special cases + database->length + name->length + 5, + 5); + dynstr_append(str, "."); + quote_name(name->str, name_buff); + dynstr_append(str, name_buff); + + views.push_back(str); + } + else if (my_strcasecmp(system_charset_info, "BASE TABLE", type->str) == 0) + tables.push_back(my_strdup(name->str, MYF(0)));
Why do you need to prepend the database name in VIEW case but not in TABLE case? Please add a comment explaining that.
+ else + DBUG_ASSERT(false && + "Unexpected type of table returned by 'SHOW FULL TABLES'");
Probably you should add an error return here, that will cause provisioning to fail in the non-debug build, rather than silently press on.
+bool provisioning_send_info::event_to_packet(Log_event &evt, String &packet) +{ + // Reset cache for writing + reinit_io_cache(event_conversion_cache, WRITE_CACHE, 0, false, true); + + if (evt.write(event_conversion_cache)) + { + error_text= "Failed to write event to conversion cache"; + return true; + } + + if (reinit_io_cache(event_conversion_cache, READ_CACHE, 0, false, false)) + { + error_text= "Failed to switch event conversion cache mode"; + return true; + } + + packet.set("\0", 1, &my_charset_bin); + if (packet.append(event_conversion_cache, + event_conversion_cache->write_pos - event_conversion_cache->buffer)) + { + error_text= "Failed to write event data to packet"; + return true; + }
Hm. So first the event is constructed in-memory. Then it is written to an IO_CACHE. Then it is read back in again into a byte buffer. And then it is written to the slave's client socket. Hm, I see, the client socket we have only as a NET, not an IO_CACHE. But the Log_event subclasses need an IO_CACHE to construct a serialised event. Hm, that's clunky, but I don't see an easy way to avoid it. But please add a comment explaining why this is necessary. Also, doesn't the IO_CACHE spill to disk if the data written becomes large? So wouldn't a very large event (eg. a row containing a large BLOB) fail here? Because the IO_CACHE only contains part of the data in-memory? I am not 100% sure about how this works, please investigate and test with a large row (if you did not already).
+ // Set current server as source of event, when slave registers on master, + // it overwrites thd->variables.server_id with its own server id - and it is + // then used when writing event + int4store(&packet[SERVER_ID_OFFSET + 1], global_system_variables.server_id); + return false;
I think this will fail if event checksums are enabled? Or rather, the patch does not seem to handle binlog checksums at all. Try adding a test that enables binlog checksums on master and slave, and check that everything works. And also - why do you need to change this in the first place? Doesn't the server_id get set correctly when the event is initially created?
+bool provisioning_send_info::send_event(Log_event &evt) +{ + String packet; + + // Slave will accept this event even if it modifies data structures + evt.flags|= LOG_EVENT_PROVISIONING_F; + + if (event_to_packet(evt, packet)) + return true; + + if (my_net_write(&thd->net, (uchar*)packet.ptr(), packet.length())) + { + error_text= "Failed to send event packet to slave"; + return true; + }
Yeah... It would be so much more elegant if we could wrap the thd->net in an IO_CACHE, and write directly to the client. Did you investigate if this would be a possibility, without too much extra work?
+int8 provisioning_send_info::send_table_data()
+ if ((error= hdl->prepare_range_scan(row_batch_end, NULL)) != 0)
+ if (table->s->primary_key == MAX_KEY) + { + error_text= "Table does not contain primary key"; + close_tables(); + return 1; + }
It seems more natural to me to check for missing primary key _before_ prepare_range_scan(). (Otherwise one wonders how it is possible to prepare a range scan on a table without _any_ key? I suppose it works to do a table scan instead, but better do the check first to avoid having to wonder).
+ while (packed_rows < PROV_ROW_BATCH_SIZE && + (error= hdl->read_range_next()) == 0);
I think there should be two stop criterias. One is max number of rows reached, another is size of rows. At least, you need to protect against exceeding --max-packet-size. Otherwise the slave will be unable to receive the too large packet. Please add a test case that checks large rows and that it correctly splits row events to not exceed max-packet-size (of course at least one row must fit in one Rows_log_event). Check the code for normal binlogging to see how it splits up Row_log_events to avoid creating too large an event.
+ table->default_column_bitmaps(); + close_tables();
So the critical issue here is: how can we be sure that we will not miss rows during provisioning, by racing between provisioning events and normal binlog events? This needs to be clearly thought out, and then explained in a comment somewhere in the code. For example, you write the events to the slave client socket before closing the table and releasing locks. This seems critical (and correct) - because this ensures that if there is a DELETE made just after the provisioning read of the table, that DELETE will be sent only _after_ sending the provisioning rows (so we do not lose the DELETE). This should be explained in a comment. But what about isolation level (for InnoDB)? Don't we need to run with a sufficiently high isolation level to be correct? Intuitively, I thought so, but maybe not, at least you have nothing in the patch about isolation level that I can see. What about this scenario? 1. Provisioning reads rows 20-29, sends them to slave. 2. An UPDATE runs, it updates row 32 to be now row 25, it is written to the binary log and sent to the slave. 3. Provisioning reads rows 30-39, sends them to the slave. On the slave, the UPDATE from 2 should fail - because row 32 does not exist at this point. But then, it seems to me that the row 25 will never appear on the slave, right? So provisioning will be incorrect. Or am I missing something? I wonder if the solution to this particular case is that during provisioning, an UPDATE (at least one that changes primary key value) must be logged as a DELETE (of the old row) + an INSERT (of the new row). Alternatively, maybe the slave can during provisioning execute the UPDATE regardless of the missing before image, still writing the after image (row 25) to the table. More generally, we really need an explanation of how the provisioning ensures that the end result on the slave is identical to the master - otherwise there may be other problems like this hidden. Also, it would be really good if you could make a test case for the above potential issue. You can get some ideas from how I use DBUG_EXECUTE_IF together with DEBUG_SYNC in the rpl_parallel_*.test test cases. But I know it can be a lot of work to get such tricky race-proviking test cases to work reliably.
diff --git a/sql/rpl_provisioning.h b/sql/rpl_provisioning.h
+// FIXME - Farnham +// Header + +#pragma once
We do not use this #pragma once in mariadb (maybe that is what you intended to fix with the FIXME, so mentioning just in case). I assume you will take care of other such FIXME's in the patch yourself before pushing.
+/* + How much rows from table will be sent in one provisioning tick + FIXME - Farnham + Dynamic value +*/ +#define PROV_ROW_BATCH_SIZE 5
Right. Probably just a matter of adding something to sql/sys_vars.cc
- if (rpl_load_gtid_slave_state(thd)) + if (!mi->provisioning_mode && rpl_load_gtid_slave_state(thd))
Actually, what should happen with the GTID position for LOAD DATA FROM MASTER? Just clearing it probably is not correct, in case user wants to LOAD DATA FROM MASTER from multiple masters. Maybe the user should clear it herself, if necessary, and then it will be updated by the normal replication stream from master. But why do you skip loading it here for provisioning mode? That does not seem right. First, it will normally be loaded already anyway. And second, if not loaded, then just having the wrong value in-memory (as compared to in the mysql.gtid_slave_pos table) does not seem correct?
+ DBUG_EXECUTE_IF("provisioning_test_running", + if (!slave_errno) + while (!mi->dump_requested_semaphore) + /* pass */;);
Maybe put a my_sleep(20000) in with the /* pass */.
+ case PROVISIONING_DONE_EVENT: + mi->abort_slave= true; + /* + Fall through to default case ... and through debug code, nothing wrong + should happen + */
That seems a bit fragile. Better do like in other places: goto default_action;
+ // In provisioning mode, master starts sending data from current position + if (flags & BINLOG_DUMP_PROVISIONING_MODE)
So provisioning assumes non-GTID mode? Maybe that is ok... But it should still be tested what happens if provisioning is attempted in GTID mode. Perhaps either GTID or non-GTID mode can work the same, since provisioning does not need to start at any particular position - and at the end of provisioning, both GTID and non-GTID position should be correct.
+bool +Ed_connection::execute_direct(char *sql_text, size_t length) +{ + LEX_STRING tmp_query; + tmp_query.str= sql_text; + tmp_query.length= length ? length : strlen(sql_text); + + return execute_direct(tmp_query); +}
You are not actually ever calling this without passing length. So please make this an inline method in the .h file, with mandatory length argument, that does not check for zero length or call strlen(). - Kristian.