That is really weird. Just having a visible vs. invisible PK should not have made any difference at all. In InnoDB there is always a PK, if you don't define one, an invisible 48-bit integer one will be defined for you.

Reasoning behind my approach was that secondary keys point at the PK, and the table is clustered in PK. So what I proposed should have avoided the additional key dereference that might have save about half of the time.

So if explicit vs. implicit PK makes a 10x difference, and it is not due to buffer pool being hot vs. cold, that is most definitely a critical performance bug you just stumbled upon.


On Sun, 29 Oct 2023, 21:08 Ivan Krylov via discuss, <discuss@lists.mariadb.org> wrote:
On Sun, 29 Oct 2023 18:12:53 +0200
Gordan Bobic <gordan.bobic@gmail.com> wrote:

> ALTER TABLE mol_trans
> DROP INDEX spid_flag_wl,
> ADD COLUMN id int unsigned auto_increment,
> ADD PRIMARY KEY (species_id, flag, wl_vac, id);

I wasn't able to add a primary key like this. My version of MariaDB
only allows the id column in the beginning of the compound primary key.
Additionally, trying to add a primary key to an existing table results
in an error message complaining about the index for the table being
corrupted (and rolling back the implicit transaction).

I tried recreating the table with PRIMARY KEY (species_id, flag,
wl_vac, upper_id, lower_id). By itself, the combination of upper_id and
lower_id must be unique, and it's our mistake that we didn't properly
declare them as foreign keys into a different table. Unfortunately,
this didn't improve the performance.

What _did_ improve performance was creating the id column with the type
INT UNSIGNED AUTO_INCREMENT and setting it to be the primary key. With
the index spid_flag_wl(species_id, flag, wl_vac) recreated, the EXPLAIN
output now looks a bit differently:

           id: 1
  select_type: SIMPLE
        table: mtr
         type: range
possible_keys: spid_flag_wl
          key: spid_flag_wl
      key_len: 16
          ref: NULL
         rows: 5487882
        Extra: Using index condition

...and I get my 3024559 rows in slightly more than 6 seconds.

So many thanks for giving me a pointer in the direction that eventually
helped solve my problem, even if I don't fully understand why it works.
Is the lesson here to always create a synthetic primary key for a table?

I've also tried recreating the table with the "natural" primary key of
(upper_id, lower_id), but it takes much longer to reinsert the rows (it
still isn't complete after tens of minutes, and previously reinsert
would be done in just a few minutes), probably because the rows don't
go in the natural order by upper_id and lower_id at all.

--
Best regards,
Ivan
_______________________________________________
discuss mailing list -- discuss@lists.mariadb.org
To unsubscribe send an email to discuss-leave@lists.mariadb.org