Re: [Maria-developers] e5fc78f84e3: MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker
Hi, Sujatha! Could you split this patch, please? 1. Just add replication_applier_status_by_worker table. With CHANNEL_NAME column and (perhaps, not sure if it applies) with WORKER_ID column. Without extra columns and backup. 2. Add extra columns. 3. Add backup. With these three commits you'll have exactly the same diff as now, just split (and with CHANNEL_NAME column). But really, I wonder whether this backup functionality is needed at all? In MySQL there is persistent information about these workers, it doesn't go away when they're stopped, it's stored persistently in a table, indexed by worker_id. If we don't have anything persistent like that and all workers completely disappear into oblivion, then, may be, replication_applier_status_by_worker should not show anything when they aren't running? If you agree, then you'll only need two commits. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org On Mar 21, Sujatha wrote:
revision-id: e5fc78f84e3 (mariadb-10.5.2-303-ge5fc78f84e3) parent(s): 8b8969929d7 author: Sujatha <sujatha.sivakumar@mariadb.com> committer: Sujatha <sujatha.sivakumar@mariadb.com> timestamp: 2020-11-27 12:59:42 +0530 message:
MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker
Fix: === Iterate through rpl_parallel_thread_pool and display slave worker thread specific information as part of 'replication_applier_status_by_worker' table.
--------------------------------------------------------------------------------- |Column Name: | Description: | |-------------------------------------------------------------------------------| | | | |THREAD_ID | Thread_Id as displayed in 'performance_schema.threads'| | | table for thread with name | | | 'thread/sql/rpl_parallel_thread' | | | | | | THREAD_ID will be NULL when worker threads are stopped| | | due to error/force stop | | | | |SERVICE_STATE | Thread is running or not | | | | |LAST_SEEN_TRANSACTION | Last GTID executed by worker | | | | |LAST_ERROR_NUMBER | Last Error that occurred on a particular worker | | | | |LAST_ERROR_MESSAGE | Last error specific message | | | | |LAST_ERROR_TIMESTAMP | Time stamp of last error | | | | |WORKER_IDLE_TIME | Total idle time in seconds that the worker thread has | | | spent waiting for work from SQL thread | | | | |LAST_TRANS_RETRY_COUNT | Total number of retries attempted by last transaction | ---------------------------------------------------------------------------------
In case STOP SLAVE is executed worker threads will be gone, hence worker threads will be unavailable. Querying the table at this stage will give empty rows. To address this case when worker threads are about to stop, due to an error or forced stop, create a backup pool and preserve the data which is relevant to populate performance schema table. Clear the backup pool upon slave start.
Hello Sergei, Good Morning. Thank for the review comments. Please find my replies inline. On 21/03/21 8:21 pm, Sergei Golubchik wrote:
Hi, Sujatha!
Could you split this patch, please?
Sure.
1. Just add replication_applier_status_by_worker table. With CHANNEL_NAME column and (perhaps, not sure if it applies) with WORKER_ID column. Without extra columns and backup.
Reason for not including CHANNEL_NAME and WORKER_ID is, multi-source based parallel replication is bit different in MySQL. Please find the following information. ==> https://dev.mysql.com/doc/refman/5.7/en/replication-channels.html A multi-source replica can also be set up as a multi-threaded replica, by setting the slave_parallel_workers system variable to a value greater than 0. When you do this on a multi-source replica, each channel on the replica has the specified number of applier threads, plus a coordinator thread to manage them. You cannot configure the number of applier threads for individual channels. In case of MySQL, worker threads are dedicated to particular channel. In MariaDB "The pool of replication worker threads is shared among all multi-source master connections, and among all replication domains that can replicate in parallel using out-of-order". Because of this I didn't include CHANNEL_NAME. In MariaDB Slave_worker thread's don't have WORKER_ID they only have 'thread_id'.
2. Add extra columns.
Ack.
3. Add backup.
With these three commits you'll have exactly the same diff as now, just split (and with CHANNEL_NAME column).
But really, I wonder whether this backup functionality is needed at all? In MySQL there is persistent information about these workers, it doesn't go away when they're stopped, it's stored persistently in a table, indexed by worker_id. If we don't have anything persistent like that and all workers completely disappear into oblivion, then, may be, replication_applier_status_by_worker should not show anything when they aren't running?
MySQL doesn't persist the worker information in a table. When workers are stopped due to an error or STOP SLAVE, worker information is copied(backup) and it is retained till next START SLAVE. Please find following snippets. File Name: sql/rpl_rli_pdb.cc ======= static void slave_stop_workers(Relay_log_info *rli, bool *mts_inited) { .... /* Make copies for reporting through the performance schema tables. This is preserved until the next START SLAVE. */ Slave_worker *worker_copy = new Slave_worker( nullptr, #ifdef HAVE_PSI_INTERFACE &key_relay_log_info_run_lock, &key_relay_log_info_data_lock, &key_relay_log_info_sleep_lock, &key_relay_log_info_thd_lock, &key_relay_log_info_data_cond, &key_relay_log_info_start_cond, &key_relay_log_info_stop_cond, &key_relay_log_info_sleep_cond, #endif w->id, rli->get_channel()); worker_copy->copy_values_for_PFS(w->id, w->running_status, w->info_thd, w->last_error(), w->get_gtid_monitoring_info()); rli->workers_copy_pfs.push_back(worker_copy); } /* This function is used to make a copy of the worker object before we destroy it while STOP SLAVE. This new object is then used to report the worker status until next START SLAVE following which the new worker objetcs will be used. */ void Slave_worker::copy_values_for_PFS(ulong worker_id, en_running_state thd_running_status, THD *worker_thd, const Error &last_error, Gtid_monitoring_info *monitoring_info) { id = worker_id; running_status = thd_running_status; info_thd = worker_thd; m_last_error = last_error; monitoring_info->copy_info_to(get_gtid_monitoring_info()); } Please provide your suggestion. Thank you S.Sujatha
If you agree, then you'll only need two commits.
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On Mar 21, Sujatha wrote:
revision-id: e5fc78f84e3 (mariadb-10.5.2-303-ge5fc78f84e3) parent(s): 8b8969929d7 author: Sujatha <sujatha.sivakumar@mariadb.com> committer: Sujatha <sujatha.sivakumar@mariadb.com> timestamp: 2020-11-27 12:59:42 +0530 message:
MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker
Fix: === Iterate through rpl_parallel_thread_pool and display slave worker thread specific information as part of 'replication_applier_status_by_worker' table.
--------------------------------------------------------------------------------- |Column Name: | Description: | |-------------------------------------------------------------------------------| | | | |THREAD_ID | Thread_Id as displayed in 'performance_schema.threads'| | | table for thread with name | | | 'thread/sql/rpl_parallel_thread' | | | | | | THREAD_ID will be NULL when worker threads are stopped| | | due to error/force stop | | | | |SERVICE_STATE | Thread is running or not | | | | |LAST_SEEN_TRANSACTION | Last GTID executed by worker | | | | |LAST_ERROR_NUMBER | Last Error that occurred on a particular worker | | | | |LAST_ERROR_MESSAGE | Last error specific message | | | | |LAST_ERROR_TIMESTAMP | Time stamp of last error | | | | |WORKER_IDLE_TIME | Total idle time in seconds that the worker thread has | | | spent waiting for work from SQL thread | | | | |LAST_TRANS_RETRY_COUNT | Total number of retries attempted by last transaction | ---------------------------------------------------------------------------------
In case STOP SLAVE is executed worker threads will be gone, hence worker threads will be unavailable. Querying the table at this stage will give empty rows. To address this case when worker threads are about to stop, due to an error or forced stop, create a backup pool and preserve the data which is relevant to populate performance schema table. Clear the backup pool upon slave start.
Hi, Sujatha! On Mar 22, sujatha wrote:
On 21/03/21 8:21 pm, Sergei Golubchik wrote:
Hi, Sujatha!
Could you split this patch, please?
Sure.
1. Just add replication_applier_status_by_worker table. With CHANNEL_NAME column and (perhaps, not sure if it applies) with WORKER_ID column. Without extra columns and backup.
Reason for not including CHANNEL_NAME and WORKER_ID is, multi-source based parallel replication is bit different in MySQL.
Please find the following information.
==> https://dev.mysql.com/doc/refman/5.7/en/replication-channels.html
A multi-source replica can also be set up as a multi-threaded replica, by setting the slave_parallel_workers system variable to a value greater than 0. When you do this on a multi-source replica, each channel on the replica has the specified number of applier threads, plus a coordinator thread to manage them. You cannot configure the number of applier threads for individual channels.
In case of MySQL, worker threads are dedicated to particular channel.
In MariaDB "The pool of replication worker threads is shared among all multi-source master connections, and among all replication domains that can replicate in parallel using out-of-order".
Because of this I didn't include CHANNEL_NAME. In MariaDB Slave_worker thread's don't have WORKER_ID they only have 'thread_id'.
Okay, thanks May be it'd still make sense to keep CHANNEL_NAME, for the connection name the worker is currently working for? Even if the pool is shared, at any given moment in time the worker can apply a transaction only from some specific connection, right?
3. Add backup.
With these three commits you'll have exactly the same diff as now, just split (and with CHANNEL_NAME column).
But really, I wonder whether this backup functionality is needed at all? In MySQL there is persistent information about these workers, it doesn't go away when they're stopped, it's stored persistently in a table, indexed by worker_id. If we don't have anything persistent like that and all workers completely disappear into oblivion, then, may be, replication_applier_status_by_worker should not show anything when they aren't running?
MySQL doesn't persist the worker information in a table. When workers are stopped due to an error or STOP SLAVE, worker information is copied(backup) and it is retained till next START SLAVE. Please find following snippets.
Well, there is mysql.slave_worker_info, it's persistent. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hello Sergei, On 22/03/21 5:42 pm, Sergei Golubchik wrote:
Hi, Sujatha!
On Mar 22, sujatha wrote:
Hi, Sujatha!
Could you split this patch, please? Sure. 1. Just add replication_applier_status_by_worker table. With CHANNEL_NAME column and (perhaps, not sure if it applies) with WORKER_ID column. Without extra columns and backup. Reason for not including CHANNEL_NAME and WORKER_ID is, multi-source
On 21/03/21 8:21 pm, Sergei Golubchik wrote: based parallel replication is bit different in MySQL.
Please find the following information.
==> https://dev.mysql.com/doc/refman/5.7/en/replication-channels.html
A multi-source replica can also be set up as a multi-threaded replica, by setting the slave_parallel_workers system variable to a value greater than 0. When you do this on a multi-source replica, each channel on the replica has the specified number of applier threads, plus a coordinator thread to manage them. You cannot configure the number of applier threads for individual channels.
In case of MySQL, worker threads are dedicated to particular channel.
In MariaDB "The pool of replication worker threads is shared among all multi-source master connections, and among all replication domains that can replicate in parallel using out-of-order".
Because of this I didn't include CHANNEL_NAME. In MariaDB Slave_worker thread's don't have WORKER_ID they only have 'thread_id'. Okay, thanks
May be it'd still make sense to keep CHANNEL_NAME, for the connection name the worker is currently working for? Even if the pool is shared, at any given moment in time the worker can apply a transaction only from some specific connection, right?
Sure. The 'Connection_Name' can be printed.
3. Add backup.
With these three commits you'll have exactly the same diff as now, just split (and with CHANNEL_NAME column).
But really, I wonder whether this backup functionality is needed at all? In MySQL there is persistent information about these workers, it doesn't go away when they're stopped, it's stored persistently in a table, indexed by worker_id. If we don't have anything persistent like that and all workers completely disappear into oblivion, then, may be, replication_applier_status_by_worker should not show anything when they aren't running? MySQL doesn't persist the worker information in a table. When workers are stopped due to an error or STOP SLAVE, worker information is copied(backup) and it is retained till next START SLAVE. Please find following snippets. Well, there is mysql.slave_worker_info, it's persistent.
Yes, you are right `mysql.slave_worker_info` persists data, for server internal usage purpose. The data which gets printed through 'performance_schema.replication_applier_status_worker' is not persisted. For example PFS tables can provide more details for the user either on stop/error though backup. mysql> select * from mysql.slave_worker_info; +----+--------------------------+---------------+-------------------+----------------+---------------------------+--------------------------+----------------------------+---------------------------+------------------+-----------------------+------------------------------------------------------------------+--------------+ | Id | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos | Checkpoint_relay_log_name | Checkpoint_relay_log_pos | Checkpoint_master_log_name | Checkpoint_master_log_pos | Checkpoint_seqno | Checkpoint_group_size | Checkpoint_group_bitmap | Channel_name | +----+--------------------------+---------------+-------------------+----------------+---------------------------+--------------------------+----------------------------+---------------------------+------------------+-----------------------+------------------------------------------------------------------+--------------+ | 1 | | 0 | | 0 | | 0 | | 0 | 0 | 64 | | | | 2 | | 0 | | 0 | | 0 | | 0 | 0 | 64 | | | | 3 | | 0 | | 0 | | 0 | | 0 | 0 | 64 | | | | 4 | ./slave-relay-bin.000002 | 810 | master-bin.000001 | 595 | ./slave-relay-bin.000002 | 369 | master-bin.000001 | 154 | 1 | 64 | | | +----+--------------------------+---------------+-------------------+----------------+---------------------------+--------------------------+----------------------------+---------------------------+------------------+-----------------------+------------------------------------------------------------------+--------------+ 4 rows in set (0.00 sec) mysql> select * from performance_schema.replication_applier_status_by_worker; +--------------+-----------+-----------+---------------+-----------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+ | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP | +--------------+-----------+-----------+---------------+-----------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+ | | 1 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 2 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 3 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 4 | NULL | OFF | ANONYMOUS | 1062 | Worker 4 failed executing transaction 'ANONYMOUS' at master log master-bin.000001, end_log_pos 817; Could not execute Write_rows event on table test.t1; Duplicate entry '30' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log master-bin.000001, end_log_pos 817 | 2021-03-22 15:44:00 | +--------------+-----------+-----------+---------------+-----------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+ 4 rows in set (0.00 sec) Thank you S.Sujatha
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi, Sujatha! On Mar 22, sujatha wrote:
1. Just add replication_applier_status_by_worker table. With CHANNEL_NAME column and (perhaps, not sure if it applies) with WORKER_ID column. Without extra columns and backup. .. In case of MySQL, worker threads are dedicated to particular channel.
In MariaDB "The pool of replication worker threads is shared among all multi-source master connections, and among all replication domains that can replicate in parallel using out-of-order".
Because of this I didn't include CHANNEL_NAME. In MariaDB Slave_worker thread's don't have WORKER_ID they only have 'thread_id'. Okay, thanks
May be it'd still make sense to keep CHANNEL_NAME, for the connection name the worker is currently working for? Even if the pool is shared, at any given moment in time the worker can apply a transaction only from some specific connection, right?
Sure. The 'Connection_Name' can be printed.
Thanks. But keep the CHANNEL_NAME column name for consistency. In other P_S tables CHANNEL_NAME column is used to show the connection name.
MySQL doesn't persist the worker information in a table. When workers are stopped due to an error or STOP SLAVE, worker information is copied(backup) and it is retained till next START SLAVE. Please find following snippets. Well, there is mysql.slave_worker_info, it's persistent. Yes, you are right `mysql.slave_worker_info` persists data, for server internal usage purpose.
I see, thanks. Then, indeed, we should do this backup-for-PS thing too. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (2)
-
Sergei Golubchik
-
sujatha