Hi, Robert! On Aug 28, Robert Bindar wrote:
revision-id: 5ba0b11c5d1 (mariadb-10.4.7-35-g5ba0b11c5d1) parent(s): 4a490d1a993 author: Robert Bindar <robert@mariadb.org> committer: Robert Bindar <robert@mariadb.org> timestamp: 2019-08-27 11:38:52 +0300 message:
MDEV-20257 Fix crash in Grant_table_base::init_read_record
The bug shows up when a 10.4+ server starts on a <10.4 datadir with a crashed mysql.user table. Grant_tables::open_and_lock tries to open the list of requested tables (host, db, global_priv,..) and because global_priv doesn't exist (pre-10.4 datadir), it falls back to opening mysql.user. The first open_tables call for [host, db, global_priv,..] works just fine. The second open_tables call (trying to open mysql.user alone) sees the crashed user table and performs 3 steps: - closes all the open tables - attempts a table fix for mysql.user (succeeds) - opens the tables initially passed as arguments In an ideal world, the first step should only close the tables passed as arguments (i.e. mysql.user), but for some reasons close_tables_for_reopen makes a close_thread_tables call which closes everything in THD::open_tables (nevertheless, this is probably the intended behavior). This side effect causes all the tables opened in the first open_tables call to now be closed and Grant_table_base::init_read_record tries to read from a released table.
diff --git a/mysql-test/main/mdev20257.test b/mysql-test/main/mdev20257.test new file mode 100644 index 00000000000..8506292d57c --- /dev/null +++ b/mysql-test/main/mdev20257.test @@ -0,0 +1,51 @@ +--source include/not_embedded.inc +--echo # +--echo # MDEV 20257 Server crashes in Grant_table_base::init_read_record +--echo # upon crash-upgrade +--echo # + +let $MYSQLD_DATADIR= `select @@datadir`; + +rename table mysql.global_priv to mysql.global_priv_bak; +rename table mysql.user to mysql.user_bak; +rename table mysql.db to mysql.db_bak; +rename table mysql.proxies_priv to mysql.proxies_priv_bak; +rename table mysql.roles_mapping to mysql.roles_mapping_bak; + +--source include/shutdown_mysqld.inc + +# Bring in a crashed user table +# Ideally we should've copied only the crashed mysql.user, but this +# would make mysqld crash in some other place before hitting the +# crash spot described in MDEV-20257 which is what we're trying to +# test against.
I don't understand that. Where does it crash then? What difference does it make that you replace all tables - are they all corrupted?
+--copy_file std_data/mdev20257/user.frm $MYSQLD_DATADIR/mysql/user.frm +--copy_file std_data/mdev20257/user.MYD $MYSQLD_DATADIR/mysql/user.MYD +--copy_file std_data/mdev20257/user.MYI $MYSQLD_DATADIR/mysql/user.MYI + +--copy_file std_data/mdev20257/host.frm $MYSQLD_DATADIR/mysql/host.frm +--copy_file std_data/mdev20257/host.MYD $MYSQLD_DATADIR/mysql/host.MYD +--copy_file std_data/mdev20257/host.MYI $MYSQLD_DATADIR/mysql/host.MYI + +--copy_file std_data/mdev20257/db.frm $MYSQLD_DATADIR/mysql/db.frm +--copy_file std_data/mdev20257/db.MYD $MYSQLD_DATADIR/mysql/db.MYD +--copy_file std_data/mdev20257/db.MYI $MYSQLD_DATADIR/mysql/db.MYI +--copy_file std_data/mdev20257/db.opt $MYSQLD_DATADIR/mysql/db.opt + +--source include/start_mysqld.inc + +call mtr.add_suppression("Table \'\..mysql\.user\' is marked as crashed and should be repaired"); +call mtr.add_suppression("Checking table: \'\..mysql\.user\'"); +call mtr.add_suppression("mysql.user: 1 client is using or hasn't closed the table properly"); +call mtr.add_suppression("Missing system table mysql..*; please run mysql_upgrade to create it"); + +# Cleanup +drop table mysql.user; +drop table mysql.db; +drop table mysql.host; +rename table mysql.global_priv_bak to mysql.global_priv; +rename table mysql.user_bak to mysql.user; +rename table mysql.db_bak to mysql.db; +rename table mysql.proxies_priv_bak to mysql.proxies_priv; +rename table mysql.roles_mapping_bak to mysql.roles_mapping; + diff --git a/mysql-test/std_data/mdev20257/db.opt b/mysql-test/std_data/mdev20257/db.opt --- /dev/null +++ b/mysql-test/std_data/mdev20257/db.opt @@ -0,0 +1,2 @@ +default-character-set=latin1 +default-collation=latin1_swedish_ci
why is that?
diff --git a/sql/sql_acl.cc b/sql/sql_acl.cc index 847d2bd777b..e028d084703 100644 --- a/sql/sql_acl.cc +++ b/sql/sql_acl.cc - DBUG_RETURN(res); + first = build_open_list(tables, which_tables, lock_type, true); + + uint counter; + if (int rv= really_open(thd, first, &counter)) + DBUG_RETURN(rv); + + TABLE *user_table= tables[USER_TABLE].table; + if ((which_tables & Table_user) && !user_table) + { + close_thread_tables(thd); + first = build_open_list(tables, which_tables, lock_type, false);
Hmm... Here you always close/reopen all tables, just to account for the case when the user table is found corrupted. I'd argue that a corrupted user table happens rarely (compared to the non-corrupted), so it makes sense to optimize for the normal use case. That is, only open the user table as before. After it's opened you check if other tables are still opened - if they aren't you can close and reopen everything. This way you only close/reopen all tables if the user table was corrupted. Makes sense?
+ if (int rv= really_open(thd, first, &counter)) + DBUG_RETURN(rv); + + p_user_table= &m_user_table_tabular; + user_table= tables[USER_TABLE + 1].table; + }
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org