[Maria-developers] MDEV-7502 Automatic provisioning of slave – retrieval of data from master
Hello Kristian, after master-slave connection part, I proceeded to retrieval of data from master. There are two types of data which provisioning master will have to send, meta-data and row data. Meta-data For this I found only one method, pass output of SHOW CREATE * to slave in 'Query_log_event'. When I was looking for alternatives, I have found MySQL online backup work log where they are describing similar/same problem under High Level Architecture – Decision ( http://dev.mysql.com/worklog/task/?id=3574 ). Because of this, I find it unlikely, that there currently is better way. Because meta-data retrieval will take only small fraction of provisioning time, I think, that it could be implemented using 'Ed_connection' class, it executes query in string form and returns result set from within replication thread. Disadvantage of this solution is that query has to go through parser and all steps of generic execution, but on the other hand resulting code will be more independent from exact implementations, their pre-conditions and won’t contain any logical mistakes (locking, etc). Row data For this, same method with 'Ed_connection' could work, but it would be probably too big overhead – parser, result processing. So I created low level loop which goes through records of table and fills them in 'Write_rows_log_event'. Could you please check it, if it looks acceptable and I will then rewrite it to generic form, it is provisioning_send_info::send_table_data() function. https://github.com/f4rnham/server/blob/3d36cbfc20dff73d92ce61168dcc9805d68d3... Right now it is working as expected on small test case. It goes through whole table at once and creates multiple row events when one would be enough. The same file also contains test of 'Ed_connection' and test of slave being able to receive and process 'Query_log_event'. Thanks for advice, Martin.
Martin Kaluznik <martin.kaluznik@gmail.com> writes:
Meta-data For this I found only one method, pass output of SHOW CREATE * to slave in 'Query_log_event'. When I was looking for alternatives, I have found MySQL online backup work log where they are describing similar/same problem under High Level Architecture – Decision ( http://dev.mysql.com/worklog/task/?id=3574 ). Because of this, I find it unlikely, that there currently is better way.
Agree. This is also the method used by eg. mysqldump.
Because meta-data retrieval will take only small fraction of provisioning time, I think, that it could be implemented using 'Ed_connection' class, it executes query in string form and returns result set from within replication thread. Disadvantage of this
I am not familiar with Ed_connection, but it seems fine. I agree performance is not an issue here. I believe there are various functions to build CREATE TABLE statements (and similar), but if Ed_connection works, I see no reason not to use it.
Row data For this, same method with 'Ed_connection' could work, but it would be probably too big overhead – parser, result processing. So I created low level loop which goes through records of table and fills them in 'Write_rows_log_event'. Could you please check it, if it looks acceptable and I will then rewrite it to generic form, it is provisioning_send_info::send_table_data() function. https://github.com/f4rnham/server/blob/3d36cbfc20dff73d92ce61168dcc9805d68d3...
Yes, it seems a good start. Again, the exact way to open tables and access them, and to handle transaction start/end and lock release, is not something I am much familiar with. But you seem already to have gotten something working. So I suggest to continue with the generic code. And when you have something, or if you have a question I cannot answer, I will find someone else to help, probably Monty or Serg. I wrote some simple code that scans a table when I implemented GTID. Maybe you can get inspiration from that. Look at rpl_slave_state::record_gtid() and rpl_load_gtid_slave_state(). rpl_slave_state::record_gtid() works both as part of an existing transaction and as a stand-alone transaction if there is no existing transaction, so might help you with the lock release stuff. Monty reviewed that code at one point, so it should be ok. The idea with MDEV-7502 is to do scan by primary key, so that we can send the table in chunks and not keep a read transaction open for a long time on large tables. Your code already uses the primary key for the scan, I think, so you probably already planned this. It is probably a good idea in any case to start getting a simpler version working where everything is done in one scan, and then add the chunking later when the simple stuff works. You should probably check if the table actually has a primary key. I think it is fine to give an error if it does not. Or we could fall back to a full scan without chunking for such tables, rpl_load_gtid_slave_state() does such a full table scan. Looks good to me so far. Thanks, - Kristian.
participants (2)
-
Kristian Nielsen
-
Martin Kaluznik