[Maria-developers] Problem that some variables can be changed without stopping the slave IO thread
Monty, This patch of yours introduced a change where before both slave threads must be stopped to change @@replicate_events_marked_for_skip, but after only the SQL thread is checked: ----------------------------------------------------------------------- commit 572560f38c248d5020f0e63aeb3f8905cd568208 Author: Michael Widenius <monty@askmonty.org> Date: Wed Oct 3 01:44:54 2012 +0300 Changed SHOW_FUNC variabels that don't return SHOW_ARRAY to SHOW_SIMPLE_FUNC. + Master_info_index::give_error_if_slave_running() + + @return + TRUE If some slave is running. An error is printed + FALSE No slave is running +*/ + +bool Master_info_index::give_error_if_slave_running() +{ + DBUG_ENTER("warn_if_slave_running"); + mysql_mutex_assert_owner(&LOCK_active_mi); + + for (uint i= 0; i< master_info_hash.records; ++i) + { + Master_info *mi; + mi= (Master_info *) my_hash_element(&master_info_hash, i); + if (mi->rli.slave_running != MYSQL_SLAVE_NOT_RUN) + { + my_error(ER_SLAVE_MUST_STOP, MYF(0), (int) mi->connection_name.length, + mi->connection_name.str); + DBUG_RETURN(TRUE); + } + } + DBUG_RETURN(FALSE); +} bool Sys_var_replicate_events_marked_for_skip::global_update(THD *thd, set_var *var) ... - /* Slave threads must be stopped to change the variable. */ + mysql_mutex_unlock(&LOCK_global_system_variables); mysql_mutex_lock(&LOCK_active_mi); - lock_slave_threads(active_mi); - init_thread_mask(&thread_mask, active_mi, 0 /*not inverse*/); - if (thread_mask) // We refuse if any slave thread is running - { - my_error(ER_SLAVE_MUST_STOP, MYF(0)); - result= true; - } - else + if (!master_info_index->give_error_if_slave_running()) ----------------------------------------------------------------------- So instead of checking both IO and SQL thread, now it checks only SQL thread (mi->rli.slave_running). Why was this change made? Was it by mistake? Unfortunately, I did not notice this before, so I subsequently used the same code (give_error_if_slave_running()) for the following variables: gtid_slave_pos slave_parallel_threads slave_domain_parallel_threads replicate_events_marked_for_skip gtid_ignore_duplicates This means that all of the above variables can be changed while the IO thread is still running. This was never the intention! There are some complex interactions between the master, the IO thread, and the SQL thread, particularly with respect to the slave's GTID position, and parts of the code are specifically written on the assumption that certain things cannot change without first stopping all replication threads. (I do not have the full overview of exactly what can break, I think that would require a careful review of large parts of the code, but I fear that it can corrupt not just the replication GTID position and possibly even internal server structures). So I am wondering what to do ... it is easy to fix give_error_if_slave_running() to check for both threads, but this is a change in behaviour (even if it should match documentation) that could easily break administration scripts, eg. after a 10.0 automatic security-fix upgrade. I also know that eg. Jean François has written about how he uses the feature of being able to not stop the IO thread in some cases. On the other hand, this can cause serious problems for users, as they have no real way of knowing not to attempt to change these variables without stopping IO thread first, when there is no error given. MDEV-9138 describes a complex scenario where slave position gets corrupted from this. Any suggestions? Thanks, - Kristian.
participants (1)
-
Kristian Nielsen