Speeding up a query: how to send 3 million rows faster?
Dear MariaDB users, (I've asked the same question at <https://dba.stackexchange.com/q/332557>, so feel free to ignore this one if you've already seen the other.) Our scientific application needs to store and query fundamental parameters for a number of molecules. There are 2 to 28 million rows per molecule, but the number of molecules is expected to stay small (currently 4). Here's our schema: CREATE TABLE `mol_trans` ( `species_id` int(11) DEFAULT NULL, `wl_vac` double DEFAULT NULL, `upper_id` int(11) DEFAULT NULL, `lower_id` int(11) DEFAULT NULL, `prob` double DEFAULT NULL, `flag` tinyint(4) DEFAULT NULL, KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci PARTITION BY LIST (`species_id`) (PARTITION `CaO` VALUES IN (6115) ENGINE = InnoDB, PARTITION `CN3` VALUES IN (6121) ENGINE = InnoDB, PARTITION `CN2` VALUES IN (6119) ENGINE = InnoDB, PARTITION `AlO` VALUES IN (6109) ENGINE = InnoDB) (The partitions are here to make it easier to drop a whole molecule if needed, which would otherwise be a painful large DELETE.) I'm using the admittedly old MariaDB 10.3.39 from Debian Buster for tests (connecting via the UNIX domain socket using the command line client), but we've been seeing the same problem on MySQL 5.6 and MariaDB 10.11 on Windows 10. The following query currently takes 1m 7s on my computer, as measured by running `time echo "$QUERY" | mysql "$DATABASE" >/dev/null`: select mtr.prob, mtr.lower_id, mtr.upper_id from mol_trans mtr where ( mtr.species_id=6115 and mtr.wl_vac > 766.0 and mtr.wl_vac < 883.0 and mtr.flag = 1 ) order by mtr.wl_vac; It needs to return 3024559 rows. The query seems to be using the right index, judging by the EXPLAIN output: id: 1 select_type: SIMPLE table: mtr type: ref possible_keys: spid_flag_wl key: spid_flag_wl key_len: 7 ref: const,const rows: 14158123 Extra: Using where With SET PROFILING=ON and the query cache disabled, the single operation with a duration of >1 second in the 23-row profile is "Sending data": Status: Sending data Duration: 66.712277 CPU_user: 66.436978 CPU_system: 0.279932 Context_voluntary: 2156 Context_involuntary: 6964 Block_ops_out: 0 Messages_sent: 0 Messages_received: 0 Page_faults_major: 0 Page_faults_minor: 0 Swaps: 0 Source_function: <unknown> Source_file: sql_parse.cc Source_line: 6086 Is there a way to spend less time sending data? I know the query can in theory run faster because it takes less than 10 seconds on SQLite and PostgreSQL, but if we switched our application from database engine to database engine every time we started having performance problems, we would be very far from done by now. I also know that MariaDB can be very fast at transferring data because it takes me only a few minutes more to download the whole database dump. -- Best regards, Ivan
On Sunday 29 October 2023 at 16:03:28, Ivan Krylov via discuss wrote:
Our scientific application needs to store and query fundamental parameters for a number of molecules. There are 2 to 28 million rows per molecule, but the number of molecules is expected to stay small (currently 4).
With SET PROFILING=ON and the query cache disabled, the single operation with a duration of >1 second in the 23-row profile is "Sending data":
The first question I would ask about this is "what spec hardware are you running it on?" 1. How much RAM? 2. HDD or SSD? 3. CPU speed / cores? Also, you said "2 to 28 million rows per molecule, and 4 molecules", but what's the actual quantity of data (Gbytes used on disk)? Antony. -- "When you talk about Linux versus Windows, you're talking about which operating system is the best value for money and fit for purpose. That's a very basic decision customers can make if they have the information available to them. Quite frankly if we lose to Linux because our customers say it's better value for money, tough luck for us." - Steve Vamos, MD of Microsoft Australia Please reply to the list; please *don't* CC me.
Thank you for replying so quickly! On Sun, 29 Oct 2023 16:11:28 +0100 Antony Stone via discuss <discuss@lists.mariadb.org> wrote:
The first question I would ask about this is "what spec hardware are you running it on?"
1. How much RAM?
32 GB.
2. HDD or SSD?
SSD. To be more specific, ext4 on top of LVM on top of LUKS on top of an NVMe SSD. I can provide some measurements if needed.
3. CPU speed / cores?
It's an 8-core Ryzen 3900X. It dynamically speeds up to 4.5GHz, but the mysqld process only uses one core while the query is running, and the other ones idle at ~2GHz.
Also, you said "2 to 28 million rows per molecule, and 4 molecules", but what's the actual quantity of data (Gbytes used on disk)?
Here are the sizes of large (>100M) files under /var/lib/mysql: 1,4G ibdata1 469M mol_trans#P#AlO.ibd 2,7G mol_trans#P#CaO.ibd # <-- species_id = 6115 221M mol_trans#P#CN2.ibd 473M mol_trans#P#CN3.ibd There are other tables in the database, but none as large as mol_trans. -- Best regards, Ivan
On Sun, Oct 29, 2023 at 5:03 PM Ivan Krylov via discuss <discuss@lists.mariadb.org> wrote:
Dear MariaDB users,
(I've asked the same question at <https://dba.stackexchange.com/q/332557>, so feel free to ignore this one if you've already seen the other.)
Our scientific application needs to store and query fundamental parameters for a number of molecules. There are 2 to 28 million rows per molecule, but the number of molecules is expected to stay small (currently 4). Here's our schema:
CREATE TABLE `mol_trans` ( `species_id` int(11) DEFAULT NULL, `wl_vac` double DEFAULT NULL, `upper_id` int(11) DEFAULT NULL, `lower_id` int(11) DEFAULT NULL, `prob` double DEFAULT NULL, `flag` tinyint(4) DEFAULT NULL, KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci PARTITION BY LIST (`species_id`) (PARTITION `CaO` VALUES IN (6115) ENGINE = InnoDB, PARTITION `CN3` VALUES IN (6121) ENGINE = InnoDB, PARTITION `CN2` VALUES IN (6119) ENGINE = InnoDB, PARTITION `AlO` VALUES IN (6109) ENGINE = InnoDB)
(The partitions are here to make it easier to drop a whole molecule if needed, which would otherwise be a painful large DELETE.)
I'm using the admittedly old MariaDB 10.3.39 from Debian Buster for tests (connecting via the UNIX domain socket using the command line client), but we've been seeing the same problem on MySQL 5.6 and MariaDB 10.11 on Windows 10.
The following query currently takes 1m 7s on my computer, as measured by running `time echo "$QUERY" | mysql "$DATABASE" >/dev/null`:
select mtr.prob, mtr.lower_id, mtr.upper_id from mol_trans mtr where ( mtr.species_id=6115 and mtr.wl_vac > 766.0 and mtr.wl_vac < 883.0 and mtr.flag = 1 ) order by mtr.wl_vac;
Try this: ALTER TABLE mol_trans DROP INDEX spid_flag_wl, ADD COLUMN id int unsigned auto_increment, ADD PRIMARY KEY (species_id, flag, wl_vac, id); That should avoid a secondary key dereference and shave maybe half of the execution time off.
On Sun, 29 Oct 2023 18:12:53 +0200 Gordan Bobic <gordan.bobic@gmail.com> wrote:
ALTER TABLE mol_trans DROP INDEX spid_flag_wl, ADD COLUMN id int unsigned auto_increment, ADD PRIMARY KEY (species_id, flag, wl_vac, id);
I wasn't able to add a primary key like this. My version of MariaDB only allows the id column in the beginning of the compound primary key. Additionally, trying to add a primary key to an existing table results in an error message complaining about the index for the table being corrupted (and rolling back the implicit transaction). I tried recreating the table with PRIMARY KEY (species_id, flag, wl_vac, upper_id, lower_id). By itself, the combination of upper_id and lower_id must be unique, and it's our mistake that we didn't properly declare them as foreign keys into a different table. Unfortunately, this didn't improve the performance. What _did_ improve performance was creating the id column with the type INT UNSIGNED AUTO_INCREMENT and setting it to be the primary key. With the index spid_flag_wl(species_id, flag, wl_vac) recreated, the EXPLAIN output now looks a bit differently: id: 1 select_type: SIMPLE table: mtr type: range possible_keys: spid_flag_wl key: spid_flag_wl key_len: 16 ref: NULL rows: 5487882 Extra: Using index condition ...and I get my 3024559 rows in slightly more than 6 seconds. So many thanks for giving me a pointer in the direction that eventually helped solve my problem, even if I don't fully understand why it works. Is the lesson here to always create a synthetic primary key for a table? I've also tried recreating the table with the "natural" primary key of (upper_id, lower_id), but it takes much longer to reinsert the rows (it still isn't complete after tens of minutes, and previously reinsert would be done in just a few minutes), probably because the rows don't go in the natural order by upper_id and lower_id at all. -- Best regards, Ivan
That is really weird. Just having a visible vs. invisible PK should not have made any difference at all. In InnoDB there is always a PK, if you don't define one, an invisible 48-bit integer one will be defined for you. Reasoning behind my approach was that secondary keys point at the PK, and the table is clustered in PK. So what I proposed should have avoided the additional key dereference that might have save about half of the time. So if explicit vs. implicit PK makes a 10x difference, and it is not due to buffer pool being hot vs. cold, that is most definitely a critical performance bug you just stumbled upon. On Sun, 29 Oct 2023, 21:08 Ivan Krylov via discuss, < discuss@lists.mariadb.org> wrote:
On Sun, 29 Oct 2023 18:12:53 +0200 Gordan Bobic <gordan.bobic@gmail.com> wrote:
ALTER TABLE mol_trans DROP INDEX spid_flag_wl, ADD COLUMN id int unsigned auto_increment, ADD PRIMARY KEY (species_id, flag, wl_vac, id);
I wasn't able to add a primary key like this. My version of MariaDB only allows the id column in the beginning of the compound primary key. Additionally, trying to add a primary key to an existing table results in an error message complaining about the index for the table being corrupted (and rolling back the implicit transaction).
I tried recreating the table with PRIMARY KEY (species_id, flag, wl_vac, upper_id, lower_id). By itself, the combination of upper_id and lower_id must be unique, and it's our mistake that we didn't properly declare them as foreign keys into a different table. Unfortunately, this didn't improve the performance.
What _did_ improve performance was creating the id column with the type INT UNSIGNED AUTO_INCREMENT and setting it to be the primary key. With the index spid_flag_wl(species_id, flag, wl_vac) recreated, the EXPLAIN output now looks a bit differently:
id: 1 select_type: SIMPLE table: mtr type: range possible_keys: spid_flag_wl key: spid_flag_wl key_len: 16 ref: NULL rows: 5487882 Extra: Using index condition
...and I get my 3024559 rows in slightly more than 6 seconds.
So many thanks for giving me a pointer in the direction that eventually helped solve my problem, even if I don't fully understand why it works. Is the lesson here to always create a synthetic primary key for a table?
I've also tried recreating the table with the "natural" primary key of (upper_id, lower_id), but it takes much longer to reinsert the rows (it still isn't complete after tens of minutes, and previously reinsert would be done in just a few minutes), probably because the rows don't go in the natural order by upper_id and lower_id at all.
-- Best regards, Ivan _______________________________________________ discuss mailing list -- discuss@lists.mariadb.org To unsubscribe send an email to discuss-leave@lists.mariadb.org
On Mon, 30 Oct 2023 08:10:02 +0200 Gordan Bobic <gordan.bobic@gmail.com> wrote:
That is really weird. Just having a visible vs. invisible PK should not have made any difference at all. In InnoDB there is always a PK, if you don't define one, an invisible 48-bit integer one will be defined for you.
Thank you for confirming this! The strangeness continues. I've built MariaDB-11.1.2 and transferred the SQL dump of the table in question there. With the original table definition, I get the following query plan: CREATE TABLE `mol_trans` ( `species_id` int(11) DEFAULT NULL, `wl_vac` double DEFAULT NULL, `upper_id` int(11) DEFAULT NULL, `lower_id` int(11) DEFAULT NULL, `prob` double DEFAULT NULL, `flag` tinyint(4) DEFAULT NULL, KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci PARTITION BY LIST (`species_id`) (PARTITION `CaO` VALUES IN (6115) ENGINE = InnoDB, PARTITION `CN3` VALUES IN (6121) ENGINE = InnoDB, PARTITION `CN2` VALUES IN (6119) ENGINE = InnoDB, PARTITION `AlO` VALUES IN (6109) ENGINE = InnoDB); explain select mtr.prob, mtr.lower_id, mtr.upper_id from mol_trans mtr where ( mtr.species_id=6115 and mtr.wl_vac > 766.0 and mtr.wl_vac < 883.0 and mtr.flag = 1 ) order by mtr.wl_vac\G id: 1 select_type: SIMPLE table: mtr type: ALL possible_keys: spid_flag_wl key: NULL key_len: NULL ref: NULL rows: 26559953 Extra: Using where; Using filesort Note that it doesn't even use the index now. This SELECT takes upwards of 1 minute to complete. If I remove the partitions *and* add an explicit synthetic primary key, I get a query plan that uses the index: CREATE TABLE `mol_trans_3` ( `species_id` int(11) NOT NULL, `wl_vac` double DEFAULT NULL, `upper_id` int(11) DEFAULT NULL, `lower_id` int(11) DEFAULT NULL, `prob` double DEFAULT NULL, `flag` tinyint(4) DEFAULT NULL, `id` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`), KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`) ) ENGINE=InnoDB AUTO_INCREMENT=81305699 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci id: 1 select_type: SIMPLE table: mtr type: range possible_keys: spid_flag_wl key: spid_flag_wl key_len: 15 ref: NULL rows: 6083086 Extra: Using index condition This query just executed on my machine in 21 seconds. Finally, if I go back to the original table and force the use of the index, I get a third query plan: explain select mtr.prob, mtr.lower_id, mtr.upper_id from mol_trans mtr force index (spid_flag_wl) where ( mtr.species_id=6115 and mtr.wl_vac > 766.0 and mtr.wl_vac < 883.0 and mtr.flag = 1 ) order by mtr.wl_vac\G id: 1 select_type: SIMPLE table: mtr type: range possible_keys: spid_flag_wl key: spid_flag_wl key_len: 16 ref: NULL rows: 6266474 Extra: Using where ...and the query now completes in 28 seconds. ANALYZE TABLE mol_trans PERSISTENT FOR ALL doesn't seem to lead to further improvements. Anything else I could do to speed this up? With MariaDB-10.3.39-0+deb10u1, I somehow get the result in 10-20 seconds using the mol_trans_3 table, no matter whether the query specifies FORCE INDEX or not. -- Best regards, Ivan
10.3 was a really good, solid, predictable release. Beyond 10.4 I have seen a lot of weirdness which is why I have been avoiding it wherever reasonably possible. Additionally, version inflation on the whole leads to performance deflation, even without execution plan differences. This has been the case since the dawn of time. So if 10.3 works well for you and it isn't in an untrusted environment, I would stick with it while you can. On Mon, Oct 30, 2023 at 1:29 PM Ivan Krylov <krylov.r00t@gmail.com> wrote:
On Mon, 30 Oct 2023 08:10:02 +0200 Gordan Bobic <gordan.bobic@gmail.com> wrote:
That is really weird. Just having a visible vs. invisible PK should not have made any difference at all. In InnoDB there is always a PK, if you don't define one, an invisible 48-bit integer one will be defined for you.
Thank you for confirming this!
The strangeness continues. I've built MariaDB-11.1.2 and transferred the SQL dump of the table in question there. With the original table definition, I get the following query plan:
CREATE TABLE `mol_trans` ( `species_id` int(11) DEFAULT NULL, `wl_vac` double DEFAULT NULL, `upper_id` int(11) DEFAULT NULL, `lower_id` int(11) DEFAULT NULL, `prob` double DEFAULT NULL, `flag` tinyint(4) DEFAULT NULL, KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci PARTITION BY LIST (`species_id`) (PARTITION `CaO` VALUES IN (6115) ENGINE = InnoDB, PARTITION `CN3` VALUES IN (6121) ENGINE = InnoDB, PARTITION `CN2` VALUES IN (6119) ENGINE = InnoDB, PARTITION `AlO` VALUES IN (6109) ENGINE = InnoDB);
explain select mtr.prob, mtr.lower_id, mtr.upper_id from mol_trans mtr where ( mtr.species_id=6115 and mtr.wl_vac > 766.0 and mtr.wl_vac < 883.0 and mtr.flag = 1 ) order by mtr.wl_vac\G
id: 1 select_type: SIMPLE table: mtr type: ALL possible_keys: spid_flag_wl key: NULL key_len: NULL ref: NULL rows: 26559953 Extra: Using where; Using filesort
Note that it doesn't even use the index now. This SELECT takes upwards of 1 minute to complete.
If I remove the partitions *and* add an explicit synthetic primary key, I get a query plan that uses the index:
CREATE TABLE `mol_trans_3` ( `species_id` int(11) NOT NULL, `wl_vac` double DEFAULT NULL, `upper_id` int(11) DEFAULT NULL, `lower_id` int(11) DEFAULT NULL, `prob` double DEFAULT NULL, `flag` tinyint(4) DEFAULT NULL, `id` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`), KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`) ) ENGINE=InnoDB AUTO_INCREMENT=81305699 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci
id: 1 select_type: SIMPLE table: mtr type: range possible_keys: spid_flag_wl key: spid_flag_wl key_len: 15 ref: NULL rows: 6083086 Extra: Using index condition
This query just executed on my machine in 21 seconds.
Finally, if I go back to the original table and force the use of the index, I get a third query plan:
explain select mtr.prob, mtr.lower_id, mtr.upper_id from mol_trans mtr force index (spid_flag_wl) where ( mtr.species_id=6115 and mtr.wl_vac > 766.0 and mtr.wl_vac < 883.0 and mtr.flag = 1 ) order by mtr.wl_vac\G
id: 1 select_type: SIMPLE table: mtr type: range possible_keys: spid_flag_wl key: spid_flag_wl key_len: 16 ref: NULL rows: 6266474 Extra: Using where
...and the query now completes in 28 seconds.
ANALYZE TABLE mol_trans PERSISTENT FOR ALL doesn't seem to lead to further improvements. Anything else I could do to speed this up?
With MariaDB-10.3.39-0+deb10u1, I somehow get the result in 10-20 seconds using the mol_trans_3 table, no matter whether the query specifies FORCE INDEX or not.
-- Best regards, Ivan
On Sun, Oct 29, 2023 at 7:08 PM Ivan Krylov via discuss < discuss@lists.mariadb.org> wrote:
On Sun, 29 Oct 2023 18:12:53 +0200 Gordan Bobic <gordan.bobic@gmail.com> wrote:
ALTER TABLE mol_trans DROP INDEX spid_flag_wl, ADD COLUMN id int unsigned auto_increment, ADD PRIMARY KEY (species_id, flag, wl_vac, id);
I wasn't able to add a primary key like this. My version of MariaDB only allows the id column in the beginning of the compound primary key.
I can't comment on the performance issues but it is possible to have the auto generated id column as last in the PK index. It only needs an index with id as the first column. This should not cause an error: ALTER TABLE mol_trans DROP INDEX spid_flag_wl, ADD COLUMN id int unsigned auto_increment, ADD INDEX (id), ADD PRIMARY KEY (species_id, flag, wl_vac, id); Pantelis Theodosiou
Just wondering, If you change the table format to MyISAM do you get better performance? I tend to use MyISAM for scientific loads where transactions and rollback are not required. Vassilis On 10/29/23 17:03, Ivan Krylov via discuss wrote:
Dear MariaDB users,
(I've asked the same question at <https://dba.stackexchange.com/q/332557>, so feel free to ignore this one if you've already seen the other.)
Our scientific application needs to store and query fundamental parameters for a number of molecules. There are 2 to 28 million rows per molecule, but the number of molecules is expected to stay small (currently 4). Here's our schema:
CREATE TABLE `mol_trans` ( `species_id` int(11) DEFAULT NULL, `wl_vac` double DEFAULT NULL, `upper_id` int(11) DEFAULT NULL, `lower_id` int(11) DEFAULT NULL, `prob` double DEFAULT NULL, `flag` tinyint(4) DEFAULT NULL, KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci PARTITION BY LIST (`species_id`) (PARTITION `CaO` VALUES IN (6115) ENGINE = InnoDB, PARTITION `CN3` VALUES IN (6121) ENGINE = InnoDB, PARTITION `CN2` VALUES IN (6119) ENGINE = InnoDB, PARTITION `AlO` VALUES IN (6109) ENGINE = InnoDB)
(The partitions are here to make it easier to drop a whole molecule if needed, which would otherwise be a painful large DELETE.)
I'm using the admittedly old MariaDB 10.3.39 from Debian Buster for tests (connecting via the UNIX domain socket using the command line client), but we've been seeing the same problem on MySQL 5.6 and MariaDB 10.11 on Windows 10.
The following query currently takes 1m 7s on my computer, as measured by running `time echo "$QUERY" | mysql "$DATABASE" >/dev/null`:
select mtr.prob, mtr.lower_id, mtr.upper_id from mol_trans mtr where ( mtr.species_id=6115 and mtr.wl_vac > 766.0 and mtr.wl_vac < 883.0 and mtr.flag = 1 ) order by mtr.wl_vac;
It needs to return 3024559 rows. The query seems to be using the right index, judging by the EXPLAIN output:
id: 1 select_type: SIMPLE table: mtr type: ref possible_keys: spid_flag_wl key: spid_flag_wl key_len: 7 ref: const,const rows: 14158123 Extra: Using where
With SET PROFILING=ON and the query cache disabled, the single operation with a duration of >1 second in the 23-row profile is "Sending data":
Status: Sending data Duration: 66.712277 CPU_user: 66.436978 CPU_system: 0.279932 Context_voluntary: 2156 Context_involuntary: 6964 Block_ops_out: 0 Messages_sent: 0 Messages_received: 0 Page_faults_major: 0 Page_faults_minor: 0 Swaps: 0 Source_function: <unknown> Source_file: sql_parse.cc Source_line: 6086
Is there a way to spend less time sending data? I know the query can in theory run faster because it takes less than 10 seconds on SQLite and PostgreSQL, but if we switched our application from database engine to database engine every time we started having performance problems, we would be very far from done by now. I also know that MariaDB can be very fast at transferring data because it takes me only a few minutes more to download the whole database dump.
В Tue, 31 Oct 2023 11:45:52 +0200 Vassilis Virvilis <vasvir@iit.demokritos.gr> пишет:
If you change the table format to MyISAM do you get better performance?
I almost missed your message, and that would have been a shame, because if I change the table format to MyISAM, I get the query results in ~3.5 seconds, which is faster than any other result I've been getting with MariaDB. Many thanks! This is the largest table in our database, but it's also the easiest to recreate, so we might just switch it to MyISAM with no apparent downsides. -- Best regards, Ivan
I have to say that surprises me. MyISAM became slower even on read-only workloads around MySQL 5. One of the reasons was that MyISAM uses the OS page cache for data caching and only caches index data internally. Which means that unless you have a covering index, you end up with more context switching to retrieve the row data. But maybe InnoDB has since been "enhanced" to the point where it is no longer outright faster on every workload. On Wed, Nov 1, 2023 at 12:14 PM Ivan Krylov via discuss <discuss@lists.mariadb.org> wrote:
В Tue, 31 Oct 2023 11:45:52 +0200 Vassilis Virvilis <vasvir@iit.demokritos.gr> пишет:
If you change the table format to MyISAM do you get better performance?
I almost missed your message, and that would have been a shame, because if I change the table format to MyISAM, I get the query results in ~3.5 seconds, which is faster than any other result I've been getting with MariaDB. Many thanks!
This is the largest table in our database, but it's also the easiest to recreate, so we might just switch it to MyISAM with no apparent downsides.
-- Best regards, Ivan _______________________________________________ discuss mailing list -- discuss@lists.mariadb.org To unsubscribe send an email to discuss-leave@lists.mariadb.org
To summarise: * "Sending data" is a red herring. MariaDB sets stage_sending_data before calling do_select(), which may perform a lot of additional work _besides_ serialising and sending data to the client. * Various sacrifices to the query optimiser spirit, such as creating an additional synthetic primary key column, can be made, but there is no good reason for them to be working. InnoDB has a perfectly good hidden rowid that it should be able to use. * There is a difference between a query plan that seems to automatically use an index and a query plan that is _forced_ to use an index: MariaDB [spmodel]> analyze select -> mtr.prob, mtr.lower_id, mtr.upper_id -> from mol_trans mtr -> where ( -> mtr.species_id=6115 -> and mtr.wl_vac > 766.0 -> and mtr.wl_vac < 883.0 -> and mtr.flag = 1 -> ) -> order by mtr.wl_vac\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: mtr type: ref possible_keys: spid_flag_wl key: spid_flag_wl key_len: 7 ref: const,const rows: 14025100 r_rows: 28417908.00 filtered: 100.00 r_filtered: 10.64 Extra: Using where 1 row in set (1 min 48,719 sec) MariaDB [spmodel]> analyze select -> mtr.prob, mtr.lower_id, mtr.upper_id -> from mol_trans mtr -> force index(spid_flag_wl) -- The only difference! -> where ( -> mtr.species_id=6115 -> and mtr.wl_vac > 766.0 -> and mtr.wl_vac < 883.0 -> and mtr.flag = 1 -> ) -> order by mtr.wl_vac\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: mtr type: range possible_keys: spid_flag_wl key: spid_flag_wl key_len: 16 ref: NULL rows: 6260712 r_rows: 3024559.00 filtered: 100.00 r_filtered: 100.00 Extra: Using where 1 row in set (11,086 sec) The latter can be much faster than the former. Since FORCE INDEX improves the performance of this query on everything ranging from MariaDB-10.3.39-0+deb10u1 to MySQL 5.6 on Windows 10, we'll be using this. * Is any of this worth reporting as a query optimiser bug? The contents of the table are the result of preprocessing a CC-BY-SA-4.0 dataset, so the only thing stopping me from reporting this is the dump being 2 GB in size. My perf-fu is probably not enough to find the source of the problem (if it is a problem) by myself. * The last but not the least, thanks to Gordan Bobic for valuable advice! -- Best regards, Ivan
I would say that is definitely worthy of a bug report. A really interesting bug, too. On Tue, Oct 31, 2023 at 1:15 PM Ivan Krylov via discuss <discuss@lists.mariadb.org> wrote:
To summarise:
* "Sending data" is a red herring. MariaDB sets stage_sending_data before calling do_select(), which may perform a lot of additional work _besides_ serialising and sending data to the client.
* Various sacrifices to the query optimiser spirit, such as creating an additional synthetic primary key column, can be made, but there is no good reason for them to be working. InnoDB has a perfectly good hidden rowid that it should be able to use.
* There is a difference between a query plan that seems to automatically use an index and a query plan that is _forced_ to use an index:
MariaDB [spmodel]> analyze select -> mtr.prob, mtr.lower_id, mtr.upper_id -> from mol_trans mtr -> where ( -> mtr.species_id=6115 -> and mtr.wl_vac > 766.0 -> and mtr.wl_vac < 883.0 -> and mtr.flag = 1 -> ) -> order by mtr.wl_vac\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: mtr type: ref possible_keys: spid_flag_wl key: spid_flag_wl key_len: 7 ref: const,const rows: 14025100 r_rows: 28417908.00 filtered: 100.00 r_filtered: 10.64 Extra: Using where 1 row in set (1 min 48,719 sec)
MariaDB [spmodel]> analyze select -> mtr.prob, mtr.lower_id, mtr.upper_id -> from mol_trans mtr -> force index(spid_flag_wl) -- The only difference! -> where ( -> mtr.species_id=6115 -> and mtr.wl_vac > 766.0 -> and mtr.wl_vac < 883.0 -> and mtr.flag = 1 -> ) -> order by mtr.wl_vac\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: mtr type: range possible_keys: spid_flag_wl key: spid_flag_wl key_len: 16 ref: NULL rows: 6260712 r_rows: 3024559.00 filtered: 100.00 r_filtered: 100.00 Extra: Using where 1 row in set (11,086 sec)
The latter can be much faster than the former. Since FORCE INDEX improves the performance of this query on everything ranging from MariaDB-10.3.39-0+deb10u1 to MySQL 5.6 on Windows 10, we'll be using this.
* Is any of this worth reporting as a query optimiser bug? The contents of the table are the result of preprocessing a CC-BY-SA-4.0 dataset, so the only thing stopping me from reporting this is the dump being 2 GB in size. My perf-fu is probably not enough to find the source of the problem (if it is a problem) by myself.
* The last but not the least, thanks to Gordan Bobic for valuable advice!
-- Best regards, Ivan _______________________________________________ discuss mailing list -- discuss@lists.mariadb.org To unsubscribe send an email to discuss-leave@lists.mariadb.org
participants (5)
-
Antony Stone
-
Gordan Bobic
-
Ivan Krylov
-
Pantelis Theodosiou
-
Vassilis Virvilis