Re: [Maria-developers] 5ba0b11c5d1: MDEV-20257 Fix crash in Grant_table_base::init_read_record
Hi, Robert! On Aug 28, Robert Bindar wrote:
revision-id: 5ba0b11c5d1 (mariadb-10.4.7-35-g5ba0b11c5d1) parent(s): 4a490d1a993 author: Robert Bindar <robert@mariadb.org> committer: Robert Bindar <robert@mariadb.org> timestamp: 2019-08-27 11:38:52 +0300 message:
MDEV-20257 Fix crash in Grant_table_base::init_read_record
The bug shows up when a 10.4+ server starts on a <10.4 datadir with a crashed mysql.user table. Grant_tables::open_and_lock tries to open the list of requested tables (host, db, global_priv,..) and because global_priv doesn't exist (pre-10.4 datadir), it falls back to opening mysql.user. The first open_tables call for [host, db, global_priv,..] works just fine. The second open_tables call (trying to open mysql.user alone) sees the crashed user table and performs 3 steps: - closes all the open tables - attempts a table fix for mysql.user (succeeds) - opens the tables initially passed as arguments In an ideal world, the first step should only close the tables passed as arguments (i.e. mysql.user), but for some reasons close_tables_for_reopen makes a close_thread_tables call which closes everything in THD::open_tables (nevertheless, this is probably the intended behavior). This side effect causes all the tables opened in the first open_tables call to now be closed and Grant_table_base::init_read_record tries to read from a released table.
diff --git a/mysql-test/main/mdev20257.test b/mysql-test/main/mdev20257.test new file mode 100644 index 00000000000..8506292d57c --- /dev/null +++ b/mysql-test/main/mdev20257.test @@ -0,0 +1,51 @@ +--source include/not_embedded.inc +--echo # +--echo # MDEV 20257 Server crashes in Grant_table_base::init_read_record +--echo # upon crash-upgrade +--echo # + +let $MYSQLD_DATADIR= `select @@datadir`; + +rename table mysql.global_priv to mysql.global_priv_bak; +rename table mysql.user to mysql.user_bak; +rename table mysql.db to mysql.db_bak; +rename table mysql.proxies_priv to mysql.proxies_priv_bak; +rename table mysql.roles_mapping to mysql.roles_mapping_bak; + +--source include/shutdown_mysqld.inc + +# Bring in a crashed user table +# Ideally we should've copied only the crashed mysql.user, but this +# would make mysqld crash in some other place before hitting the +# crash spot described in MDEV-20257 which is what we're trying to +# test against.
I don't understand that. Where does it crash then? What difference does it make that you replace all tables - are they all corrupted?
+--copy_file std_data/mdev20257/user.frm $MYSQLD_DATADIR/mysql/user.frm +--copy_file std_data/mdev20257/user.MYD $MYSQLD_DATADIR/mysql/user.MYD +--copy_file std_data/mdev20257/user.MYI $MYSQLD_DATADIR/mysql/user.MYI + +--copy_file std_data/mdev20257/host.frm $MYSQLD_DATADIR/mysql/host.frm +--copy_file std_data/mdev20257/host.MYD $MYSQLD_DATADIR/mysql/host.MYD +--copy_file std_data/mdev20257/host.MYI $MYSQLD_DATADIR/mysql/host.MYI + +--copy_file std_data/mdev20257/db.frm $MYSQLD_DATADIR/mysql/db.frm +--copy_file std_data/mdev20257/db.MYD $MYSQLD_DATADIR/mysql/db.MYD +--copy_file std_data/mdev20257/db.MYI $MYSQLD_DATADIR/mysql/db.MYI +--copy_file std_data/mdev20257/db.opt $MYSQLD_DATADIR/mysql/db.opt + +--source include/start_mysqld.inc + +call mtr.add_suppression("Table \'\..mysql\.user\' is marked as crashed and should be repaired"); +call mtr.add_suppression("Checking table: \'\..mysql\.user\'"); +call mtr.add_suppression("mysql.user: 1 client is using or hasn't closed the table properly"); +call mtr.add_suppression("Missing system table mysql..*; please run mysql_upgrade to create it"); + +# Cleanup +drop table mysql.user; +drop table mysql.db; +drop table mysql.host; +rename table mysql.global_priv_bak to mysql.global_priv; +rename table mysql.user_bak to mysql.user; +rename table mysql.db_bak to mysql.db; +rename table mysql.proxies_priv_bak to mysql.proxies_priv; +rename table mysql.roles_mapping_bak to mysql.roles_mapping; + diff --git a/mysql-test/std_data/mdev20257/db.opt b/mysql-test/std_data/mdev20257/db.opt --- /dev/null +++ b/mysql-test/std_data/mdev20257/db.opt @@ -0,0 +1,2 @@ +default-character-set=latin1 +default-collation=latin1_swedish_ci
why is that?
diff --git a/sql/sql_acl.cc b/sql/sql_acl.cc index 847d2bd777b..e028d084703 100644 --- a/sql/sql_acl.cc +++ b/sql/sql_acl.cc - DBUG_RETURN(res); + first = build_open_list(tables, which_tables, lock_type, true); + + uint counter; + if (int rv= really_open(thd, first, &counter)) + DBUG_RETURN(rv); + + TABLE *user_table= tables[USER_TABLE].table; + if ((which_tables & Table_user) && !user_table) + { + close_thread_tables(thd); + first = build_open_list(tables, which_tables, lock_type, false);
Hmm... Here you always close/reopen all tables, just to account for the case when the user table is found corrupted. I'd argue that a corrupted user table happens rarely (compared to the non-corrupted), so it makes sense to optimize for the normal use case. That is, only open the user table as before. After it's opened you check if other tables are still opened - if they aren't you can close and reopen everything. This way you only close/reopen all tables if the user table was corrupted. Makes sense?
+ if (int rv= really_open(thd, first, &counter)) + DBUG_RETURN(rv); + + p_user_table= &m_user_table_tabular; + user_table= tables[USER_TABLE + 1].table; + }
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi Sergei, thanks for looking into this. On Wed, Aug 28, 2019 at 7:54 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Robert!
On Aug 28, Robert Bindar wrote:
revision-id: 5ba0b11c5d1 (mariadb-10.4.7-35-g5ba0b11c5d1) parent(s): 4a490d1a993 author: Robert Bindar <robert@mariadb.org> committer: Robert Bindar <robert@mariadb.org> timestamp: 2019-08-27 11:38:52 +0300 message:
MDEV-20257 Fix crash in Grant_table_base::init_read_record
The bug shows up when a 10.4+ server starts on a <10.4 datadir with a crashed mysql.user table. Grant_tables::open_and_lock tries to open the list of requested tables (host, db, global_priv,..) and because global_priv doesn't exist (pre-10.4 datadir), it falls back to opening mysql.user. The first open_tables call for [host, db, global_priv,..] works just fine. The second open_tables call (trying to open mysql.user alone) sees the crashed user table and performs 3 steps: - closes all the open tables - attempts a table fix for mysql.user (succeeds) - opens the tables initially passed as arguments In an ideal world, the first step should only close the tables passed as arguments (i.e. mysql.user), but for some reasons close_tables_for_reopen makes a close_thread_tables call which closes everything in THD::open_tables (nevertheless, this is probably the intended behavior). This side effect causes all the tables opened in the first open_tables call to now be closed and Grant_table_base::init_read_record tries to read from a released table.
diff --git a/mysql-test/main/mdev20257.test b/mysql-test/main/mdev20257.test new file mode 100644 index 00000000000..8506292d57c --- /dev/null +++ b/mysql-test/main/mdev20257.test @@ -0,0 +1,51 @@ +--source include/not_embedded.inc +--echo # +--echo # MDEV 20257 Server crashes in Grant_table_base::init_read_record +--echo # upon crash-upgrade +--echo # + +let $MYSQLD_DATADIR= `select @@datadir`; + +rename table mysql.global_priv to mysql.global_priv_bak; +rename table mysql.user to mysql.user_bak; +rename table mysql.db to mysql.db_bak; +rename table mysql.proxies_priv to mysql.proxies_priv_bak; +rename table mysql.roles_mapping to mysql.roles_mapping_bak; + +--source include/shutdown_mysqld.inc + +# Bring in a crashed user table +# Ideally we should've copied only the crashed mysql.user, but this +# would make mysqld crash in some other place before hitting the +# crash spot described in MDEV-20257 which is what we're trying to +# test against.
I don't understand that. Where does it crash then? What difference does it make that you replace all tables - are they all corrupted?
Only mysql.user is corrupted. Assuming we take out the fix in acl, if we don't replace the db table with the 10.2 version provided by Elena, the server crashes a bit earlier in lock_tables (called from Grant_tables::open_and_lock) for reasons I don't yet understand, my guess is it has something to do with the aria/myisam change between 10.2 and 10.4. The host table needs to be copied because the reported crash happened when acl_load tried to read the host table and in 10.4 it is not there anymore. And for proxies_priv and roles_mapping, it is enough if we just hide them (otherwise the same Aria crash happens).
+--copy_file std_data/mdev20257/user.frm $MYSQLD_DATADIR/mysql/user.frm +--copy_file std_data/mdev20257/user.MYD $MYSQLD_DATADIR/mysql/user.MYD +--copy_file std_data/mdev20257/user.MYI $MYSQLD_DATADIR/mysql/user.MYI + +--copy_file std_data/mdev20257/host.frm $MYSQLD_DATADIR/mysql/host.frm +--copy_file std_data/mdev20257/host.MYD $MYSQLD_DATADIR/mysql/host.MYD +--copy_file std_data/mdev20257/host.MYI $MYSQLD_DATADIR/mysql/host.MYI + +--copy_file std_data/mdev20257/db.frm $MYSQLD_DATADIR/mysql/db.frm +--copy_file std_data/mdev20257/db.MYD $MYSQLD_DATADIR/mysql/db.MYD +--copy_file std_data/mdev20257/db.MYI $MYSQLD_DATADIR/mysql/db.MYI +--copy_file std_data/mdev20257/db.opt $MYSQLD_DATADIR/mysql/db.opt + +--source include/start_mysqld.inc + +call mtr.add_suppression("Table \'\..mysql\.user\' is marked as crashed and should be repaired"); +call mtr.add_suppression("Checking table: \'\..mysql\.user\'"); +call mtr.add_suppression("mysql.user: 1 client is using or hasn't closed the table properly"); +call mtr.add_suppression("Missing system table mysql..*; please run mysql_upgrade to create it"); + +# Cleanup +drop table mysql.user; +drop table mysql.db; +drop table mysql.host; +rename table mysql.global_priv_bak to mysql.global_priv; +rename table mysql.user_bak to mysql.user; +rename table mysql.db_bak to mysql.db; +rename table mysql.proxies_priv_bak to mysql.proxies_priv; +rename table mysql.roles_mapping_bak to mysql.roles_mapping; + diff --git a/mysql-test/std_data/mdev20257/db.opt b/mysql-test/std_data/mdev20257/db.opt --- /dev/null +++ b/mysql-test/std_data/mdev20257/db.opt @@ -0,0 +1,2 @@ +default-character-set=latin1 +default-collation=latin1_swedish_ci
why is that?
I agree we don't need db.opt at all.
diff --git a/sql/sql_acl.cc b/sql/sql_acl.cc index 847d2bd777b..e028d084703 100644 --- a/sql/sql_acl.cc +++ b/sql/sql_acl.cc - DBUG_RETURN(res); + first = build_open_list(tables, which_tables, lock_type, true); + + uint counter; + if (int rv= really_open(thd, first, &counter)) + DBUG_RETURN(rv); + + TABLE *user_table= tables[USER_TABLE].table; + if ((which_tables & Table_user) && !user_table) + { + close_thread_tables(thd); + first = build_open_list(tables, which_tables, lock_type, false);
Hmm... Here you always close/reopen all tables, just to account for the case when the user table is found corrupted.
I'd argue that a corrupted user table happens rarely (compared to the non-corrupted), so it makes sense to optimize for the normal use case.
That is, only open the user table as before. After it's opened you check if other tables are still opened - if they aren't you can close and reopen everything. This way you only close/reopen all tables if the user table was corrupted.
Makes sense?
It makes sense. Is there any other way to check if a table is opened
except for going through thd->open_tables and see if the table is there? Touching TABLE_LIST::table that we have in open_and_lock might not be a good idea if the table instance is already released to table cache.
+ if (int rv= really_open(thd, first, &counter)) + DBUG_RETURN(rv); + + p_user_table= &m_user_table_tabular; + user_table= tables[USER_TABLE + 1].table; + }
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (2)
-
Robert Bindar
-
Sergei Golubchik