[Maria-developers] Mdev-10664 Add statuses about optimistic parallel replication stalls.

older
Re: [Maria-developers] Review for...

Sachin Setiya

8 Jan 2017 8 Jan '17

9:34 a.m.

Hi Kristian, I am stuck at one problem in Mdev-10664. Suppose there is multisource replication('master1' and 'master2' ) and we want to update status var, How to know which master_info to update ?. Does slave threads have current replication channel name? (I was not able to find any , I have looked in rpl_group_info) I am attaching patch file, only above problem need to solved in patch. -- Regards Sachin Setiya Software Engineer at MariaDB

Attachments:

mdev-10664.patch (text/x-patch — 11.8 KB)

Show replies by date

Kristian Nielsen

8 Jan 8 Jan

11:39 a.m.

Sachin Setiya <sachin.setiya@mariadb.com> writes:

...

I am stuck at one problem in Mdev-10664. Suppose there is multisource replication('master1' and 'master2' ) and we want to update status var, How to know which master_info to update ?. Does slave threads have current replication channel name? (I was not able to find any , I have looked in rpl_group_info)

rgi->rli->mi

...

diff --git a/mysql-test/suite/multi_source/multi_parallel.test b/mysql-test/suite/multi_source/multi_parallel.test new file mode 100644 index 0000000..e27345d --- /dev/null +++ b/mysql-test/suite/multi_source/multi_parallel.test

...

+--replace_result $SERVER_MYPORT_1 MYPORT_1 +eval change master 'master1' to +master_port=$SERVER_MYPORT_1, +master_host='127.0.0.1', +master_user='root', +master_heartbeat_period = 25; + + + +--replace_result $SERVER_MYPORT_2 MYPORT_2 +eval change master 'master2' to +master_port=$SERVER_MYPORT_2, +master_host='127.0.0.1', +master_user='root', +master_heartbeat_period=35;

Why do you set heartbeat_period here?

...

+show variables like 'slave%';

Why do you need to show all these variables? It is very likely to cause the test to fail in some environments where the output may differ for whatever reason.

...

+# Cleanup +stop all slaves; +--source include/wait_for_slave_to_stop.inc

This will only wait for one slave to stop, won't it?

...

diff --git a/sql/log_event.cc b/sql/log_event.cc index ced2626..c041116 100644 --- a/sql/log_event.cc +++ b/sql/log_event.cc @@ -6780,6 +6780,17 @@ Gtid_log_event::do_apply_event(rpl_group_info *rgi) }

DBUG_ASSERT((bits & OPTION_GTID_BEGIN) == 0); + mysql_mutex_lock(&LOCK_active_mi);

Ouch! You are now taking an expensive global lock for each and every transaction :-( Surely this will hurt parallel replication performance. This seems completely unnecessary. For example, if no flags are set you are not doing anything here, so no lock needed at all. And even if a flag is set, there is no need to take LOCK_active_mi, the Master_info cannot go away in the middle of do_apply_event(). According to my understanding, incremention a status variable should not require taking a global mutex at all.

...

+ Master_info *mi= master_info_index->get_master_info(&thd->variables.default_master_connection, + Sql_condition::WARN_LEVEL_NOTE); + if (mi) + { + if (flags2 & FL_DDL) + mi->total_ddl_statements++; + if (!(flags & FL_TRANSACTIONAL)) + mi->total_non_trans_statements++;

The name "statements" is very confusing here, since you are counting event groups, not individual statements. Did you consider writing documentation for the feature? Writing the documentation to explain to users exactly how the feature work could help clarify such issues for yourself also, before coding...

...

diff --git a/sql/mysqld.cc b/sql/mysqld.cc index adb1b27..5fcaf34 100644 --- a/sql/mysqld.cc +++ b/sql/mysqld.cc

...

@@ -8473,6 +8525,8 @@ SHOW_VAR status_vars[]= { {"Slave_retried_transactions",(char*)&slave_retried_transactions, SHOW_LONG}, {"Slave_running", (char*) &show_slave_running, SHOW_SIMPLE_FUNC}, {"Slave_skipped_errors", (char*) &slave_skipped_errors, SHOW_LONGLONG}, + {"Slave_DDL_statements", (char*) &show_slave_ddl_stmts, SHOW_SIMPLE_FUNC}, + {"Slave_Non_transactional_statements", (char *) &show_slave_non_trans_stmts, SHOW_SIMPLE_FUNC},

Again, this is not statements, it is event groups. Or "transactions" if you like, which is a term more familiar to users, though a "non-transactional transaction" is also not perfect terminology. - Kristian.

Sachin Setiya

3:09 p.m.

New subject: [Maria-developers] Mdev-10664 Add statuses about optimistic parallel replication stalls.

Hi Kristian! On Sun, Jan 8, 2017 at 4:09 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:

...

Sachin Setiya <sachin.setiya@mariadb.com> writes:

...
I am stuck at one problem in Mdev-10664. Suppose there is multisource replication('master1' and 'master2' ) and we want to update status var, How to know which master_info to update ?. Does slave threads have current replication channel name? (I was not able to find any , I have looked in rpl_group_info)

rgi->rli->mi Thanks.

...
diff --git a/mysql-test/suite/multi_source/multi_parallel.test b/mysql-test/suite/multi_source/multi_parallel.test new file mode 100644 index 0000000..e27345d --- /dev/null +++ b/mysql-test/suite/multi_source/multi_parallel.test

...
+--replace_result $SERVER_MYPORT_1 MYPORT_1 +eval change master 'master1' to +master_port=$SERVER_MYPORT_1, +master_host='127.0.0.1', +master_user='root', +master_heartbeat_period = 25; + + + +--replace_result $SERVER_MYPORT_2 MYPORT_2 +eval change master 'master2' to +master_port=$SERVER_MYPORT_2, +master_host='127.0.0.1', +master_user='root', +master_heartbeat_period=35;

Why do you set heartbeat_period here?

I was copying from other tests in multi_source, so I copied it without thinking, sorry Reverted.

...

...
+show variables like 'slave%';

Why do you need to show all these variables? It is very likely to cause the test to fail in some environments where the output may differ for whatever reason.

Sorry for this,I forgot to remove it , I was experimenting with slave_parallel_mode and slave_parallel threads , so I added this To my surprise slave_parallel_mode is replication channel specific , while slave_parallel threadsis a global variable, Why So ? And in mariadb documentation I was unable to find which variable is replication channel specific and which variable is global one. Reverted

...

...
+# Cleanup +stop all slaves; +--source include/wait_for_slave_to_stop.inc

This will only wait for one slave to stop, won't it?

yup, changed.

...

...
diff --git a/sql/log_event.cc b/sql/log_event.cc index ced2626..c041116 100644 --- a/sql/log_event.cc +++ b/sql/log_event.cc @@ -6780,6 +6780,17 @@ Gtid_log_event::do_apply_event(rpl_group_info *rgi) }

DBUG_ASSERT((bits & OPTION_GTID_BEGIN) == 0); + mysql_mutex_lock(&LOCK_active_mi);

Ouch! You are now taking an expensive global lock for each and every transaction :-( Surely this will hurt parallel replication performance.

Actually I was thinking the same, But I do not know how to get the master info So I used the same approch as in method show_slave_ddl_stmts. I no longer use mutex.

...

This seems completely unnecessary. For example, if no flags are set you are not doing anything here, so no lock needed at all. And even if a flag is set, there is no need to take LOCK_active_mi, the Master_info cannot go away in the middle of do_apply_event(). According to my understanding, incremention a status variable should not require taking a global mutex at all. Yep, because if there is DDL statement or non transnational statement it is executed alone. So no need of mutex.

...
+ Master_info *mi= master_info_index->get_master_info(&thd->variables.default_master_connection, + Sql_condition::WARN_LEVEL_NOTE); + if (mi) + { + if (flags2 & FL_DDL) + mi->total_ddl_statements++; + if (!(flags & FL_TRANSACTIONAL)) + mi->total_non_trans_statements++;

The name "statements" is very confusing here, since you are counting event groups, not individual statements.

Changed It to event total_ddl_events , total_non_trans_events

...

Did you consider writing documentation for the feature? Writing the documentation to explain to users exactly how the feature work could help clarify such issues for yourself also, before coding...

Never tried it , Will try it for next bug/task.

...

...
diff --git a/sql/mysqld.cc b/sql/mysqld.cc index adb1b27..5fcaf34 100644 --- a/sql/mysqld.cc +++ b/sql/mysqld.cc

...
@@ -8473,6 +8525,8 @@ SHOW_VAR status_vars[]= { {"Slave_retried_transactions",(char*)&slave_retried_transactions, SHOW_LONG}, {"Slave_running", (char*) &show_slave_running, SHOW_SIMPLE_FUNC}, {"Slave_skipped_errors", (char*) &slave_skipped_errors, SHOW_LONGLONG}, + {"Slave_DDL_statements", (char*) &show_slave_ddl_stmts, SHOW_SIMPLE_FUNC}, + {"Slave_Non_transactional_statements", (char *) &show_slave_non_trans_stmts, SHOW_SIMPLE_FUNC},

Again, this is not statements, it is event groups. Or "transactions" if you like, which is a term more familiar to users, though a "non-transactional transaction" is also not perfect terminology.

Used Event.

...

- Kristian.

-- Regards sachin

Kristian Nielsen

5:59 p.m.

Sachin Setiya <sachin.setiya@mariadb.com> writes:

...

To my surprise slave_parallel_mode is replication channel specific , while slave_parallel threadsis a global variable, Why So ?

There is a single pool of worker threads, to enable threads to be shared among multi-source connections - for example so all threads can work on one busy connection while another is idle. That is why slave_parallel_threads must be global. In contrast, it is perfectly possible to run one multi-source connection in conservative mode, and another in optimistic, for example.

...

And in mariadb documentation I was unable to find which variable is replication channel specific and which variable is global one.

Hm, if it's not documented that slave_parallel_mode is per-channel, that is unfortunate (this is something that was introduced for 10.1).

...

Yep, because if there is DDL statement or non transnational statement it is executed alone. So no need of mutex.

Right, but you still need _some_ way to synchronise multiple threads incrementing the counter simultaneously (it is perfectly possible for two non-transactional event groups to execute concurrently in conservative mode, for example).

...

...
The name "statements" is very confusing here, since you are counting event groups, not individual statements.

Changed It to event total_ddl_events , total_non_trans_events

It's not individual events. It is "event groups", though this is mostly a terminology internally in the source code. The term "transaction" is unfortunate for something that is non-transactional. The name "event group" is also bad, as it is unfamiliar to users, I think. Maybe "total_ddl_events" is ok. I think in most cases only one statement at a time can be in an event group, when it is non-transactional or DDL. Though multiple binlog events will be involved for the single statement.

...

From this, _statement might also be ok, though it is somewhat unfortunate in case of row-based replication, where no SQL statements are involved at all on the slave.

So baring better ideas, I guess *_events is ok, just be sure to make it clear in the documentation exactly what is counted. - Kristian.

Jean-Francois B. Gagne

4:09 p.m.

New subject: [Maria-developers] [External] Mdev-10664 Add statuses about optimistic parallel replication stalls.

Hi Sachin, I am very happy to see you work on that. Comments below. On 8 January 2017 at 09:34, Sachin Setiya <sachin.setiya@mariadb.com> wrote:

...

I am stuck at one problem in Mdev-10664. Suppose there is multisource replication('master1' and 'master2' ) and we want to update status var, How to know which master_info to update ?. Does slave threads have current replication channel name? (I was not able to find any , I have looked in rpl_group_info)

By updating master_info above you mean updating the master_info RAM structre, right ? Not the master_info file... The easier way to give visibility on parallel replication stalls is to add a global status variable like slave_retried_transactions (slave_optimistic_stalls ?). However, this hides which multi-source channel generated that stall (or retried transaction). Is there a way to show this information in SHOW SLAVE STATUS or in Performance Schema (Oracle MySQL is exposing per-channel counter in P_S I thing...) ? While you are working on this, might I suggest that you also expose something like what BINLOG_COMMIT and BINLOG_GROUP_COMMIT the slave is seeing from the master in conservative mode (and per-channel). Today, to monitor parallel replication, we need to get this information from the master and maps it to the position including lag) of the slave. It would be great to be able to monitor a slave only by looking at the slave (something like slave_binlog_commit_master and slave_binlog_group_commit_master global status would be a 1st start). Many thanks JFG

Sachin Setiya

5:08 p.m.

New subject: [Maria-developers] [External] Mdev-10664 Add statuses about optimistic parallel replication stalls.

Hi Jean-Francois B. Gagne, On Sun, Jan 8, 2017 at 8:39 PM, Jean-Francois B. Gagne <jeanfrancois.gagne@booking.com> wrote:

...

Hi Sachin,

I am very happy to see you work on that. Comments below.

On 8 January 2017 at 09:34, Sachin Setiya <sachin.setiya@mariadb.com> wrote:

...
I am stuck at one problem in Mdev-10664. Suppose there is multisource replication('master1' and 'master2' ) and we want to update status var, How to know which master_info to update ?. Does slave threads have current replication channel name? (I was not able to find any , I have looked in rpl_group_info)

By updating master_info above you mean updating the master_info RAM structre, right ? Not the master_info file... Yes.

The easier way to give visibility on parallel replication stalls is to add a global status variable like slave_retried_transactions (slave_optimistic_stalls ?). However, this hides which multi-source channel generated that stall (or retried transaction). Is there a way to show this information in SHOW SLAVE STATUS or in Performance Schema (Oracle MySQL is exposing per-channel counter in P_S I thing...) ? Yes this can be done. Kristian what you think ?

While you are working on this, might I suggest that you also expose something like what BINLOG_COMMIT and BINLOG_GROUP_COMMIT the slave is seeing from the master in conservative mode (and per-channel). Today, to monitor parallel replication, we need to get this information from the master and maps it to the position including lag) of the slave. It would be great to be able to monitor a slave only by looking at the slave (something like slave_binlog_commit_master and slave_binlog_group_commit_master global status would be a 1st start). Created a mdev for this task MDEV-11744. I am not sure how to do this. It will take some time.

Many thanks

JFG

Regards sachin

3106

Age (days ago)

3106

Last active (days ago)

List overview

5 comments

3 participants

participants (3)

Jean-Francois B. Gagne
Kristian Nielsen
Sachin Setiya