[Maria-discuss] ALTER TABLE ENIGNE=InnoDB performance
Hi, We're looking at changing an existing database from MyISAM to InnoDB. The existing system is using MariaDB 5.5.56 on RHEL 7.4 The plan is to - create a replication slave on the same host - use ALTER TABLE .. ENGINE=InnoDB on the slave, with 4 ALTERs running concurrently - replace the master with the "coverted" slave Doing this with a small database (~ 3.5GB) works okay with the largest table taking ~ 7 minutes to convert. However, while trying to figure out how to replicate from the new master, we found that we need "FLUSH TABLES .. FOR EXPORT" which isn't available in MariaDB 5.5. So we updated to MariaDB 10.2.12 and started again. Now when we run the "ALTER TABLE .. ENIGNE=InnoDB", it takes hours from the ALTERs to complete and we're getting the following log entries 2018-01-09 12:36:03 140132036818688 [ERROR] InnoDB: Cannot save table statistics for table `statsdb`.`#sql-6d1a_12#P#P20160101` /* Partition `P20160101` */: Lock wait timeout 2018-01-09 12:36:54 140132036515584 [ERROR] InnoDB: Cannot save table statistics for table `statsdb`.`#sql-6d1a_13#P#P20160101` /* Partition `P20160101` */: Lock wait timeout 2018-01-09 12:37:45 140132036818688 [ERROR] InnoDB: Cannot save table statistics for table `statsdb`.`#sql-6d1a_12#P#P20160601` /* Partition `P20160601` */: Lock wait timeout 2018-01-09 12:38:36 140132036818688 [ERROR] InnoDB: Cannot save table statistics for table `statsdb`.`#sql-6d1a_12#P#P20160701` /* Partition `P20160701` */: Lock wait timeout 2018-01-09 12:39:27 140132037728000 [ERROR] InnoDB: Cannot save table statistics for table `statsdb`.`#sql-6d1a_11#P#P20160701` /* Partition `P20160701` */: Lock wait timeout ... Any ideas on what's wrong here? Thanks, Conor
Hi Conor, The error message for persistent statistics is due to something that I recently changed in MariaDB 10.2.12 and 10.3.3: commit 7dc6066dead562e70a68e6727fe4ee65d0bd0c72
<https://github.com/MariaDB/server/commit/7dc6066dead562e70a68e6727fe4ee65d0bd0c72> Author: Marko Mäkelä <marko.makela@mariadb.com> Date: Fri Dec 1 16:51:24 2017 +0200
MDEV-14511 <https://jira.mariadb.org/browse/MDEV-14511> Use fewer transactions for updating InnoDB persistent statistics
This patch tries to avoid some consistency problems related to InnoDB persistent statistics. It is a long story, but the persistent statistics are being written by an InnoDB internal SQL interpreter that requires the InnoDB data dictionary cache to be locked. Before the above change, the statistics were written during DDL in separate transactions, which could unnecessarily reduce performance (each commit would require a redo log flush) and break atomicity, because the statistics would be updated separately from the dictionary transaction. However, because it is unacceptable to hold the data dictionary cache locked while suspending the execution for waiting for a record lock (in the mysql.innodb_index_stats or mysql.innodb_table_stats tables) to be released, any lock conflict will immediately be reported as "lock wait timeout". As far as I understand, such failures to update statistics did occur earlier as well, but maybe less frequently, because we would really suspend the thread while holding the dictionary lock. It is not the end of the world if the statistics are missing or a bit off. As a matter of fact, I noticed a change in MySQL 5.7.11 <https://github.com/mysql/mysql-server/commit/8bae549264f174a926cd3e52561cad4f31e90f07> (and MariaDB 10.2) that will purposely drop the statistics in a certain case of ALTER TABLE, and rely on a subsequent ha_innobase::open() to update them. It could actually make sense to do this on all ALTER TABLE operations. If we did this, then we would also rename tables in the statistics tables less often (only on RENAME TABLE), and would be less prone to get these errors. I do not think that it makes sense to ever update or rename statistics for the #sql table names. I think that the likelihood of the problem is increased by executing multiple ALTER TABLE statements in parallel. It also "helps" to use partitioning. So, in summary, MariaDB 10.2.12 should not be more broken than it was before, and I would like you to submit an issue at https://jira.mariadb.org/ so that this can be fixed in an upcoming release. On Tue, Jan 9, 2018 at 3:00 PM, Conor Murphy <conor_mark_murphy@hotmail.com> wrote:
Now when we run the "ALTER TABLE .. ENIGNE=InnoDB", it takes hours from the ALTERs to complete and we're getting the following log entries
2018-01-09 12:36:03 140132036818688 [ERROR] InnoDB: Cannot save table statistics for table `statsdb`.`#sql-6d1a_12#P#P20160101` /* Partition `P20160101` */: Lock wait timeout
On a related note, the ALTER TABLE…ALGORITHM=COPY performance for InnoDB should be improved by MDEV-11415 <https://jira.mariadb.org/browse/MDEV-11415> and MDEV-515 <https://jira.mariadb.org/browse/MDEV-515> in later major MariaDB releases. One thing that you could try would be to: ALTER TABLE … DROP INDEX i, DROP INDEX j, …, ENGINE=InnoDB; ALTER TABLE … ADD INDEX (i), ADD INDEX(i), ALGORITHM=INPLACE; Until we have MDEV-515, the ENGINE-converting ALTER will be very slow for tables that contain secondary indexes. So, you should copy the table to InnoDB without those indexes, and then create them separately, using a much faster algorithm. MySQL 5.7 (and MariaDB 10.2) got a more efficient algorithm for creating the indexes page by page. In earlier versions, starting with the InnoDB Plugin for MySQL 5.1, the ADD INDEX would only pre-sort the records of each index, and insert them one by one. Even that is much faster than what the engine-conversion does: insert each row one by one, to each index, without any pre-sorting. With best regards, Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation DON’T MISS M|18 MariaDB User Conference February 26 - 27, 2018 New York City https://m18.mariadb.com/
Hi Marko, Thanks for the info. I re-did the ALTER, running each statement sequentially. This time the statements completed in a timely manner and there wasn't any ERRORs recorded about the table statistics. However, the whole exercise has reached a dead-end as we need to be able to replicate the database and as far as I can tell we need "ALTER TABLE t DISCARD PARTITION ALL TABLESPACE" which isn't available. Regards, Conor
Hi Conor, There is a bug report for the error messages now: https://jira.mariadb.org/browse/MDEV-14941 MDEV-14941 Timeouts on persistent statistics tables after upgrade On Sat, Jan 13, 2018 at 1:30 AM, Conor Murphy <conor_mark_murphy@hotmail.com> wrote:
However, the whole exercise has reached a dead-end as we need to be able to replicate the database and as far as I can tell we need "ALTER TABLE t DISCARD PARTITION ALL TABLESPACE" which isn't available.
MariaDB is indeed missing the ALTER TABLE t DISCARD|IMPORT PARTITION…TABLESPACE syntax. MariaDB is also missing the "MySQL 5.7 native InnoDB partitioning" which probably means that their code change cannot be easily ported. The following commit in MySQL 5.7 added the the syntax: commit 9c4d999381bb8bfcc48921c0ece378f877e47bb3 Author: Mattias Jonsson <mattias.jonsson@oracle.com> Date: Thu Feb 6 14:02:06 2014 +0100 WL#6868: Support transportable tablespaces for single innodb partition. (Based on WL#6867.) Made alter_info.partition_names a List<String> instead of List<char>, so it is the same as table_list.partition_names. Added syntax support for: ALTER TABLE t DISCARD PARTITION <list_of_parts> TABLESPACE ALTER TABLE t IMPORT PARTITION <list_of_parts> TABLESPACE which will only DISCARD/IMPORT the listed partitions. Note that FLUSH FOR EXPORT still works on table level! The following is the merge commit to MySQL 5.7 (which was called mysql-trunk at that time). commit 6be6f3e3820183f315c153dafa12465e6832851d Merge: d168602867c e7fc4dce3ef Author: Mattias Jonsson <mattias.jonsson@oracle.com> Date: Wed Feb 26 10:32:57 2014 +0100 WL#6867: Support transportable tablespaces for partitioned innodb tables. Merge into mysql-trunk. I would not be keen to update the DISCARD/IMPORT TABLESPACE in MariaDB, unless a customer is asking for it. I would rather have faster ALTER TABLE…ALGORITHM=COPY (MDEV-11415) and bulk inserts (MDEV-515) and more easily transportable InnoDB files (just FLUSH TABLES…FOR EXPORT, copy the files, and let the server discover them, similar to how it works with MyISAM). The DISCARD/IMPORT feels like a crude hack to me. DISCARD is breaking referential integrity of the data dictionary, and the adjustments during IMPORT (which could be avoided with some file format changes) are neither atomic nor crash-safe. -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation DON’T MISS M|18 MariaDB User Conference February 26 - 27, 2018 New York City https://m18.mariadb.com/
Hi Marko, Thanks for the update. It's not that we're looking specifically for the "ALTER TABLE t DISCARD PARTITION ALL TABLESPACE", we're just looking for a way to setup the initial copy of the database for the replication slave. Some of the tables in our database are large (billions of rows), so transportable InnoDB files seems to be the only practical solution for creating the initial copy for the replication. However, as we use range partitioning (our large tables are time series data), I can't see any way to do this. Regards, Conor
Hi Conor, Did you try the following? ALTER TABLE t1 EXCHANGE PARTITION p1 WITH TABLE t2; It might allow you to work around the problem. You'd first need to convert the partitions to tables, then export and import the .ibd files, and finally exchange with partitions again. Disclaimer: I only know InnoDB, not partitioning. I hope that this will actually only rename the tables inside InnoDB. (In InnoDB, each partition is treated as a separate table.) Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation DON’T MISS M|18 MariaDB User Conference February 26 - 27, 2018 New York City https://m18.mariadb.com/
Hi Marko, I don't think that's an option - It would interfere with the normal operations on the master, currently we use a "FLUSH TABLES WITH READ LOCK" with an LVM snapshot to get a consistent copy of the database while the system is up and running. Changing the PARTITIONs to TABLEs would mean having to stop the running system. - I haven't checked this in a while (> a year), but the last time we looked at swapping a TABLE back to a PARTITION, it took hours, I think it's because it validates that the rows in the TABLE belong in the PARTITION according to the partitioning scheme. MySQL has the "WITHOUT VALIDATION" option to overcome this Regards, Conor
Hi, Have you considered using Xtrabackup to create a replication slave? This is essentially a binary copy kept in sync with the innodb transaction log, and it works great. On Wed, Jan 17, 2018 at 12:55 PM, Conor Murphy < conor_mark_murphy@hotmail.com> wrote:
Hi Marko,
I don't think that's an option
- It would interfere with the normal operations on the master, currently we use a "FLUSH TABLES WITH READ LOCK" with an LVM snapshot to get a consistent copy of the database while the system is up and running. Changing the PARTITIONs to TABLEs would mean having to stop the running system.
- I haven't checked this in a while (> a year), but the last time we looked at swapping a TABLE back to a PARTITION, it took hours, I think it's because it validates that the rows in the TABLE belong in the PARTITION according to the partitioning scheme. MySQL has the "WITHOUT VALIDATION" option to overcome this
Regards, Conor
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi, I'm trying to figure out why an a composite INDEX is only being partially used during a query Given the following tables, MariaDB [tmp]> SHOW CREATE TABLE reftable; +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | reftable | CREATE TABLE `reftable` ( `id` mediumint(9) NOT NULL AUTO_INCREMENT, `groupid` smallint(6) NOT NULL, `name` varchar(50) CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL, PRIMARY KEY (`id`), KEY `groupIdIdx` (`groupid`) ) ENGINE=MyISAM AUTO_INCREMENT=25001 DEFAULT CHARSET=latin1 | +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) MariaDB [tmp]> SHOW CREATE TABLE valtable; +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | valtable | CREATE TABLE `valtable` ( `refid` mediumint(9) NOT NULL, `time` datetime NOT NULL, `value` int(10) unsigned NOT NULL, KEY `refTimeIdx` (`refid`,`time`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) The following query seems to be only using the refid part of the index MariaDB [tmp]> EXPLAIN -> SELECT AVG(valtable.value) -> FROM valtable, reftable -> WHERE -> valtable.refid = reftable.id AND reftable.groupid = 2000 AND -> valtable.time BETWEEN '2018-01-15 00:00:00' AND '2018-01-15 23:59:59'; +------+-------------+----------+------+-------------------+-----------+---------+-----------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+-------------------+-----------+---------+-----------------+------+-----------------------+ | 1 | SIMPLE | reftable | ref | PRIMARY,serverIdx | serverIdx | 2 | const | 4 | | | 1 | SIMPLE | valtable | ref | refid | refid | 3 | tmp.reftable.id | 690 | Using index condition | +------+-------------+----------+------+-------------------+-----------+---------+-----------------+------+-----------------------+ But if I change the query and and directly specify the refids, it makes full use of the index MariaDB [tmp]> SELECT id FROM reftable WHERE groupid = 2000; +-------+ | id | +-------+ | 9996 | | 9997 | | 9998 | | 9999 | | 10000 | +-------+ 5 rows in set (0.00 sec) MariaDB [tmp]> EXPLAIN -> SELECT AVG(valtable.value) -> FROM valtable -> WHERE -> valtable.refid IN ( 9996, 9997, 9998, 9999, 10000 ) AND -> valtable.time BETWEEN '2018-01-15 00:00:00' AND '2018-01-15 23:59:59'; +------+-------------+----------+-------+---------------+------------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+-------+---------------+------------+---------+------+------+-----------------------+ | 1 | SIMPLE | valtable | range | refTimeIdx | refTimeIdx | 11 | NULL | 26 | Using index condition | +------+-------------+----------+-------+---------------+------------+---------+------+------+-----------------------+ 1 row in set (0.00 sec) Is this expected behaviour and if so is there any reference that explains the behaviour? Thanks, Conor
participants (3)
-
Conor Murphy
-
Justin Swanhart
-
Marko Mäkelä