22 Jan
2020
22 Jan
'20
12:45 p.m.
GitHub branch is still in development phase , I will update it as soon as I implement more features , although r_ttt test in rpl suite can be runned. On Wed, Jan 22, 2020, 5:59 PM Sachin Setiya <sachin.setiya@mariadb.com> wrote: > Hi Everyone! > > This mail will describe usage and implementation of Lag Free Alter. And > this > will also answer the question raised by Kristian and Simon J Mudd(in 2016) > > Desc:- This will split Alter into 2 different commits. START ALTER and > COMMIT > /ROLLBACK ALTER , Start Alter will be written in binlog as soon as we get > the > locks for the table, alter will proceeds as usual and at the time of > writing > binlog if alter is successful we will write COMMIT Alter other wise > ROLLBACK > Alter. > > Usage:- Using this feature is quite simple. > 1. On master you have to turn on `BINLOG_SPLIT_ALTER` dynamic variable. > 2. Slave must be using parallel replication. > > Advance Usage:- > So alter is divided like this. > 1. START identifier Actual_alter_stmt > 2. COMMIT/ROLLBACK identifier Actual_alter_stmt. OR > 2. COMMIT/ROLLBACK identifier ALTER > > identifier is thread_id. > > Questions by Simon Mudd. > >>* this behaviour should be configurable? > Yes. > >> - new global variable on the master to allow injection of this > changed event stream? > Right , `BINLOG_SPLIT_ALTER` > >> - a new session variable set in the session where the command is > triggered ? > Right , `BINLOG_SPLIT_ALTER` > >> - on slave a setting to indicate how many INPLACE ALTER TABLEs can > run at once? > No setting so far , but I am thinking of maximum no of CONCURRENT ALTER = > slave_parallel_threads > >>* how does a DBA monitor what’s going on? > >> - the progress and number of active DDL statements > So as we there is 2 part, So progress will go like this > 1. Executing of start alter (this will take most of time) > 2. Waiting for commit/rollback Signal > 3. Commit/ Rollback Alter. > Number of active ALTER , these will create new threads so DBA can know > using this > or I am thinking of adding variable in SHOW SLAVE INFO. which will > show active DDL. > >> - please consider adding counters/metrics for: > >> * number of “asynchronous DDLs” in progress / completed successfully > / failed or rolled back > Okay, We can have counter for these metrics. If i get time to implement > this. > >> * sizes of the DDL changes made so far > Not sure if we need this. > >> * number of threads busy in this state (maybe can be implied from > SHOW PROCESSLIST or equivalent but nicer to have an easy to query metric) > Show processlist will show the busy threads. > Sample Output when concurrent alter is running on slave. > slave_parallel_threads= 10 > Every 1.0s: mysql -uroot -S > /home/sachin/alter/build/mysql-test/var/tmp/mysqld.2.sock -e show > processlist > sachin-sp52: Wed Jan 22 16:56:00 2020 > > Id User Host db Command Time State Info Progress > 1 system user NULL Daemon NULL InnoDB purge > coordinator NULL 0.000 > 3 system user NULL Daemon NULL InnoDB purge > worker NULL 0.000 > 2 system user NULL Daemon NULL InnoDB purge > worker NULL 0.000 > 4 system user NULL Daemon NULL InnoDB purge > worker NULL 0.000 > 5 system user NULL Daemon NULL InnoDB > shutdown handler NULL 0.000 > 10 root localhost:44478 test Sleep 6 NULL > 0.000 > 11 root localhost:44480 test Sleep 7 NULL > 0.000 > 15 root localhost:44492 test Sleep 6 NULL > 0.000 > 16 root localhost:44494 test Sleep 6 NULL > 0.000 > 17 system user NULL Slave_IO 6 > Waiting for master to send event NULL 0.000 > 19 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 22 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 20 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 23 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 24 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 25 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 26 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 21 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 27 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 28 system user NULL Slave_worker 6 > Waiting for work from SQL thread NULL 0.000 > 18 system user NULL Slave_SQL 6 Slave > has read all relay log; waiting for the slave I/O thread to update it > NULL 0.000 > 29 system user test Slave_worker 6 NULL > /*!100001 START 19 alter table t3 add column c int, force, > algorithm=inplace */ 0.000 > 30 system user test Slave_worker 6 NULL > /*!100001 START 17 alter table t1 add column c int, force, > algorithm=inplace */ 0.000 > 31 system user test Slave_worker 6 NULL > /*!100001 START 18 alter table t2 add column c int, force, > algorithm=inplace */ 0.000 > 32 system user test Slave_worker 6 NULL > /*!100001 START 20 alter table t4 add column c int, force, > algorithm=inplace */ 0.000 > 33 system user test Slave_worker 6 NULL > /*!100001 START 21 alter table t5 add column c int, force, > algorithm=inplace */ 0.000 > 34 root localhost NULL Query 0 Init show > processlist 0.000 > > >>* what happens if the commit/rollback ddl event never arrives? (the > statement on the master could be skipped for one of several reasons) > So here are 2 options > If slave is still running DBA can manually run > COMMIT/ROLLBACK identifier alter_stmt/ALTER , where identifier is same as > START ALTER identifier. > so this will either commit or rollback alter. > If slave is not running. > Since slave is not running all context is lost , so this will take as much > as > time a normal alter. > >> - I assume that users on the slave see the old structure all the time > until it completes > Right, we work on copy of alter for myisam and for other case innodb will > take > care of it. > >> - would this have a storage or performance penalty on the slave if the > commit/rollback DDL event never arrives? > Not performance but yes storage and ram penalty. > >> - can a “commit” / “rollback” be manually triggered by the DBA in such > circumstances? > yes. > >>* what happens if the server crashes while this process (or these > processes) are ongoing when the server starts up again? > So if we crash in START ALTER no issue , we can run it again , since it is > transactional(we are working on copy so ..) > If we crash after start alter and we receive COMMIT alter then it will be > treated as normal alter. > ROLLBACK ALTER will be instant. > If we crash in COMMIT ALTER this will be same as crash in normal ALTER > table , > so DBA has to deal with it. > > Architecture:- > Lets look at sample alter perf data for innodb and myisam > Myisam > + 96.72% 0.00% mysqld mysqld [.] mysql_parse > + 96.72% 0.00% mysqld mysqld [.] > mysql_execute_command > + 96.72% 0.00% mysqld mysqld [.] > Sql_cmd_alter_table::execute > - 96.72% 0.00% mysqld mysqld [.] mysql_alter_table > - mysql_alter_table > - 96.63% copy_data_between_tables > + 79.64% handler::ha_write_row > + 6.24% TABLE::update_default_fields > + 5.71% READ_RECORD::read_record > + 1.87% do_copy_null > 1.05% Field::do_field_int > + 0.95% _mcount > InnoDB > + 41.27% 0.00% mysqld mysqld [.] mysql_parse > + 41.27% 0.00% mysqld mysqld [.] > mysql_execute_command > + 41.27% 0.00% mysqld mysqld [.] > Sql_cmd_alter_table::execute > - 41.27% 0.00% mysqld mysqld [.] mysql_alter_table > - mysql_alter_table > - 41.27% mysql_inplace_alter_table > - 41.15% handler::ha_inplace_alter_table > ha_innobase::inplace_alter_table > - row_merge_build_indexes > - 41.00% row_merge_read_clustered_index > + 22.88% row_merge_insert_index_tuples > + 5.89% row_build_w_add_vcol > + 5.20% BtrBulk::finish > + 3.20% row_merge_buf_add > + 0.87% page_cur_move_to_next > 0.62% rec_get_offsets_func > So as we can see most of the work is done by copy_data_between_tables/ > mysql_inplace_alter_table. > Master Side:- > There is not much change into master side. > So after locking table and before executing these functions we write > START _id_ alter_stmt into binlog. > And at the time of write_bin_log we write > COMMIT _id_ alter_stmt into binlog. > So alter statement will be divided into 2 , hence 2 gtid. START_ALTER will > have special flag FL_START_ALTER_E1(4), this will be used on slave side to > create new worker thread for start alter processing. > > Slave Side:- > This require parallel replication on slave side. > So in do_event when we get a gtid_event with FL_START_ALTER_E1, > ::choose_thread > will call rpl_parallel_add_extra_worker, which will create a new worker > thread > and this will do start alter processing. So the gtid_log_event and next > Query_log_event will be scheduled to this new thread. And this thread will > be > exited as soon as alter is finished. > On slave side START ALTER will binlog after successfully getting MDL lock > and > thd lock. Untill this point we are executed as DDL(so new GCO), but after > getting locks we will call finish_event_group. So that new events can > proceed in > parallel. > Then it will continue to execute code in mysql_alter_table until > we reach a non transactional part(like renaming table/dropping table in > myisam) > So before executing NON-Transactional part it will wait signal from other > worker > thread to either abort or proceed forward. We will add a entry into mi-> > start_alter_list, with thread_id(of master) as a key. > > COMMIT/ROLLBACK Alter is treated as a normal query_log_event so it will be > assigned normal worker. This command will take SQLCOM_COMMIT_PREVIOUS path > in > mysql_execute_command. We will simple search for thread_id into > start_alter_list > and change status from ::WAITING to ::COMMIT_ALTER and signal the wait > condition > if we dont find the thread_id we will wait on wait condition, So > simple it is just > consumer producer between start alter and commit/rollback. > > Questions by Kristian:- > >Can you explain what you mean by "true LOCK=NONE"? It is not clear from > your > >description. I think you mean something that will allow to run an ALTER > >TABLE on the slave in parallel with the same statement on the master? > Yes > >There will be a number of things to consider with this. Previously, all > >transactions in the binlog have been atomic; this introduces related > events > >that can be arbitrarily separated in the binlog. > >For example, suppose the slave stops in the middle between BEGIN_DDL_EVENT > >and COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT. What is then the slave's binlog > >position / GTID position? > Now we are using 2 gtid for ALTER , So it wont be issue. > >Hopefully, the exiting pool of slave worker threads (currently used for > >parallel replication) can be used for this as well? Doesn't seem right to > >introduce a new kind of thread pool... > We are using same(global) thread pool but I am creating new threads for > START > ALTER. Otherwise we will have a deadlock , Suppose if we have 5 concurrent > START > ALTER and just 5 slave-worker-threads , all worker will be waiting for > COMMIT/ROLLBACK, but we cant execute commit rollback because we dont have > any > free worker. There can be more case when we have concurrent DML too , > so creating > a new thread was the safest option. > >Won't you need some mechanism to prevent following events to access the > >table before ALTER TABLE in a worker thread can acquire the metadata lock? > Right, So Untill Locking START ALTER is executed as DDL so no following > event > will be executed before locking. > And one more thing regarding locks , So there will we only one thread will > be > doing whole work of alter, (we can just assume that there is some > sleep on slave) > so execution on master and slave will be equivalent. > >There are some nasty issues to be aware of with potential deadlocks > related > >to FLUSH TABLES WITH GLOBAL READ LOCK and such, when multiple threads are > >trying to coordinate waiting and locking; there were already some quite > hard > >issues with this for parallel replication. > I have to test for it. > > Code branch bb-10.5-olter > Jira (Mdev-11675(https://jira.mariadb.org/browse/MDEV-11675)) > Regards > Sachin > > > -- > Regards > Sachin Setiya > Software Engineer at MariaDB >