Replication corrupts index on InnoDB
Hi All, I have a very strange problem which I am not able to debug any further, so I kindly ask here for any advise to find the root cause of this issue. I run a master-slave replication setup, both servers run on Debian Bookworm with the packages from the distro (10.11.4-MariaDB-1~deb12u1-log). There are two tables in two databases with the same setup (same shop software) and both suffer from recurring index corruption on the slave side. While the master server works without any issues, the slave machine stops replicating from time to time with the error: Last_Errno: 1712 Last_Error: Error 'Index s_order_details is corrupted' on query. Default database: 'foobar_shop'. Query: 'DELETE FROM s_order_details WHERE orderID='40007'' After rebuilding the index by running "alter table foobar_shop.s_order_details engine=innodb" I can restart the slave and it works again. What I did so far: Rebuild the whole slave from fresh SQL Dumps (twice!) Changed replication to "Using_Gtid: Slave_Pos" Reduced slave parallel threads Settings as different from debian defaults binlog_format = mixed expire_logs_days = 1 max_binlog_size = 256M slave_parallel_max_queued=524288 slave_parallel_threads=5 sync_binlog=0 sync_relay_log=0 innodb_flush_log_at_trx_commit=0 best regards Oliver -- Protect your environment - close windows and adopt a penguin!
Hi Oliver, It is good that you can reproduce this after rebuilding the replica from a logical SQL dump. Was everything done on MariaDB Server 10.11.4? There are several bugs that could explain this corruption that have been fixed since the release of MariaDB Server 10.11.4. These include MDEV-32530, MDEV-31767 and MDEV-30531. Can you reproduce this corruption after upgrading to 10.11.6? Best regards, Marko
Hello Marko, thanks for the fast reply - the test with the rebuild were done with 10.11.5 taken from Debaian unstable as I had to grab them earlier last year when I got hit by https://jira.mariadb.org/browse/MDEV-30531 after upgrading to bookworm. To have a clean state to start from I rolled this back two weeks ago but this time I used a binary copy from the master. I will follow your suggesting and prepare a new machine using the project repositories and will report back my findings. Thanks so far Oliver On 16.01.24 10:20, Marko Mäkelä via discuss wrote:
Hi Oliver,
It is good that you can reproduce this after rebuilding the replica from a logical SQL dump. Was everything done on MariaDB Server 10.11.4? There are several bugs that could explain this corruption that have been fixed since the release of MariaDB Server 10.11.4. These include MDEV-32530, MDEV-31767 and MDEV-30531. Can you reproduce this corruption after upgrading to 10.11.6?
Best regards,
Marko _______________________________________________ discuss mailing list -- discuss@lists.mariadb.org To unsubscribe send an email to discuss-leave@lists.mariadb.org
-- Protect your environment - close windows and adopt a penguin!
Hello, for the records: after upgrading to 10.11.6 from the project repo the error is gone and sync is stable for more than two weeks now. Oliver On 16.01.24 13:36, Oliver Welter via discuss wrote:
Hello Marko,
thanks for the fast reply - the test with the rebuild were done with 10.11.5 taken from Debaian unstable as I had to grab them earlier last year when I got hit by https://jira.mariadb.org/browse/MDEV-30531 after upgrading to bookworm. To have a clean state to start from I rolled this back two weeks ago but this time I used a binary copy from the master.
I will follow your suggesting and prepare a new machine using the project repositories and will report back my findings.
Thanks so far
Oliver
On 16.01.24 10:20, Marko Mäkelä via discuss wrote:
Hi Oliver,
It is good that you can reproduce this after rebuilding the replica from a logical SQL dump. Was everything done on MariaDB Server 10.11.4? There are several bugs that could explain this corruption that have been fixed since the release of MariaDB Server 10.11.4. These include MDEV-32530, MDEV-31767 and MDEV-30531. Can you reproduce this corruption after upgrading to 10.11.6?
Best regards,
Marko _______________________________________________ discuss mailing list -- discuss@lists.mariadb.org To unsubscribe send an email to discuss-leave@lists.mariadb.org
-- Protect your environment - close windows and adopt a penguin!
Hi Oliver, I am happy that 10.11.6 has been stable for you. There is one more corruption issue https://jira.mariadb.org/browse/MDEV-33379 which I think is related to the use of O_DIRECT on some copy-on-write file systems, including at least bcachefs, possibly btrfs and others. I will work on it, but it will for sure miss the scheduled quarterly releases, which I hope will finally be out this week. Marko
participants (2)
-
Marko Mäkelä
-
Oliver Welter