Hello Sergei, Good Morning. Thank for the review comments. Please find my replies inline. On 21/03/21 8:21 pm, Sergei Golubchik wrote:
Hi, Sujatha!
Could you split this patch, please?
Sure.
1. Just add replication_applier_status_by_worker table. With CHANNEL_NAME column and (perhaps, not sure if it applies) with WORKER_ID column. Without extra columns and backup.
Reason for not including CHANNEL_NAME and WORKER_ID is, multi-source based parallel replication is bit different in MySQL. Please find the following information. ==> https://dev.mysql.com/doc/refman/5.7/en/replication-channels.html A multi-source replica can also be set up as a multi-threaded replica, by setting the slave_parallel_workers system variable to a value greater than 0. When you do this on a multi-source replica, each channel on the replica has the specified number of applier threads, plus a coordinator thread to manage them. You cannot configure the number of applier threads for individual channels. In case of MySQL, worker threads are dedicated to particular channel. In MariaDB "The pool of replication worker threads is shared among all multi-source master connections, and among all replication domains that can replicate in parallel using out-of-order". Because of this I didn't include CHANNEL_NAME. In MariaDB Slave_worker thread's don't have WORKER_ID they only have 'thread_id'.
2. Add extra columns.
Ack.
3. Add backup.
With these three commits you'll have exactly the same diff as now, just split (and with CHANNEL_NAME column).
But really, I wonder whether this backup functionality is needed at all? In MySQL there is persistent information about these workers, it doesn't go away when they're stopped, it's stored persistently in a table, indexed by worker_id. If we don't have anything persistent like that and all workers completely disappear into oblivion, then, may be, replication_applier_status_by_worker should not show anything when they aren't running?
MySQL doesn't persist the worker information in a table. When workers are stopped due to an error or STOP SLAVE, worker information is copied(backup) and it is retained till next START SLAVE. Please find following snippets. File Name: sql/rpl_rli_pdb.cc ======= static void slave_stop_workers(Relay_log_info *rli, bool *mts_inited) { .... /* Make copies for reporting through the performance schema tables. This is preserved until the next START SLAVE. */ Slave_worker *worker_copy = new Slave_worker( nullptr, #ifdef HAVE_PSI_INTERFACE &key_relay_log_info_run_lock, &key_relay_log_info_data_lock, &key_relay_log_info_sleep_lock, &key_relay_log_info_thd_lock, &key_relay_log_info_data_cond, &key_relay_log_info_start_cond, &key_relay_log_info_stop_cond, &key_relay_log_info_sleep_cond, #endif w->id, rli->get_channel()); worker_copy->copy_values_for_PFS(w->id, w->running_status, w->info_thd, w->last_error(), w->get_gtid_monitoring_info()); rli->workers_copy_pfs.push_back(worker_copy); } /* This function is used to make a copy of the worker object before we destroy it while STOP SLAVE. This new object is then used to report the worker status until next START SLAVE following which the new worker objetcs will be used. */ void Slave_worker::copy_values_for_PFS(ulong worker_id, en_running_state thd_running_status, THD *worker_thd, const Error &last_error, Gtid_monitoring_info *monitoring_info) { id = worker_id; running_status = thd_running_status; info_thd = worker_thd; m_last_error = last_error; monitoring_info->copy_info_to(get_gtid_monitoring_info()); } Please provide your suggestion. Thank you S.Sujatha
If you agree, then you'll only need two commits.
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On Mar 21, Sujatha wrote:
revision-id: e5fc78f84e3 (mariadb-10.5.2-303-ge5fc78f84e3) parent(s): 8b8969929d7 author: Sujatha <sujatha.sivakumar@mariadb.com> committer: Sujatha <sujatha.sivakumar@mariadb.com> timestamp: 2020-11-27 12:59:42 +0530 message:
MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker
Fix: === Iterate through rpl_parallel_thread_pool and display slave worker thread specific information as part of 'replication_applier_status_by_worker' table.
--------------------------------------------------------------------------------- |Column Name: | Description: | |-------------------------------------------------------------------------------| | | | |THREAD_ID | Thread_Id as displayed in 'performance_schema.threads'| | | table for thread with name | | | 'thread/sql/rpl_parallel_thread' | | | | | | THREAD_ID will be NULL when worker threads are stopped| | | due to error/force stop | | | | |SERVICE_STATE | Thread is running or not | | | | |LAST_SEEN_TRANSACTION | Last GTID executed by worker | | | | |LAST_ERROR_NUMBER | Last Error that occurred on a particular worker | | | | |LAST_ERROR_MESSAGE | Last error specific message | | | | |LAST_ERROR_TIMESTAMP | Time stamp of last error | | | | |WORKER_IDLE_TIME | Total idle time in seconds that the worker thread has | | | spent waiting for work from SQL thread | | | | |LAST_TRANS_RETRY_COUNT | Total number of retries attempted by last transaction | ---------------------------------------------------------------------------------
In case STOP SLAVE is executed worker threads will be gone, hence worker threads will be unavailable. Querying the table at this stage will give empty rows. To address this case when worker threads are about to stop, due to an error or forced stop, create a backup pool and preserve the data which is relevant to populate performance schema table. Clear the backup pool upon slave start.