Hi Otto, On Wed, Jan 22, 2025 at 3:07 AM Otto Kekäläinen via developers <developers@lists.mariadb.org> wrote:
Has anyone attempted or seen any existing implementations "physical" replication between two MariaDB instances, where a secondary read-only instance would follow the primary by directly reading and applying page changes from the primary's redo logs?
That makes me think of Amazon Aurora, which to my understanding uses physical replication, possibly by storing log snippets in a key-value store, indexed by the tablespace identifier and page number. The storage underneath the page cache would apparently support fetching pages as of a specific LSN. Freezing the LSN is what would provide a consistent view. (Alternatively, a multi-page read mini-transaction could be retried if one of the pages had been modified during the execution.)
The secondary instance would start (and restart if necessary) using a full copy of the data from the primary instance, but while running keep itself current by reading redo logs from the primary instance, and apply pages based on the increasing LSN number. I call this "physical" replication to distinguish it from normal logical replication based on e.g. binlogs, and this would obviously only work for InnoDB.
Theoretically, this could be fairly easily implemented for a read-only replica of a DML-only workload. As soon as any writes (including the purge of committed transaction history, or any change buffer merge) are enabled, the replica would start to produce log on its own and its LSN would necessarily diverge from the source. Related to this, in xtrabackup but not in mariadb-backup you will find the option --apply-log-only. In mariadb-backup we have some debug assertions in place to ensure that incremental backups will not be broken. The only situation where mariadb-backup --prepare is writing log records or modifying any data on its own is when it is being invoked with the --export option and it needs to clean up the .ibd files that are being exported. When it comes to DDL, starting with MySQL 8.0, I would expect all changes to be made durable via the InnoDB write-ahead log. The crash-safe DDL starting with MariaDB Server 10.6 works differently, using a separate file ddl_recovery.log that covers operations on .frm files, which form part of the data dictionary. Not all DDL operations are atomic; for example DROP DATABASE is being executed as a sequence of DROP TABLE operations. I believe that any hot backup or replication solution when multiple log files are being used will require some additional locking. If everything was made durable via a single write-ahead log file, it should be straightforward to replicate just that one file, without any locking being necessary. When it comes to binary logs and replication, an improvement is being worked on. https://jira.mariadb.org/browse/MDEV-34705 aims to implement an option to write the binlog through the InnoDB ib_logfile0. That should also make the problematic setting sync_binlog=0 (https://jira.mariadb.org/browse/MDEV-16589) obsolete. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc