I am implementing Binlog GTID Indexes, to fix an old performance regression for GTID when slaves connect to the master. I now have a solid design and a working prototype, so I wanted to describe the work to encourage early comments and suggestions. I'm hoping this will make it to the next release (where is the place to actively follow/participate in the release planning?) Ever since I implemented GTID, connecting slaves needed to sequentially scan the leading portion of one binlog file to locate the GTID position to start at. This can be slow as binlog files default to 1GB in size, even worse so when binlog files can now be encrypted. I files MDEV-4991 for this 10 years ago, and it is embarrassing that this has somehow remained unfixed for so long :-/ But now I'm fixing it. The current implementation is in the work-in-progress branch knielsen_mdev4991: https://github.com/MariaDB/server/tree/knielsen_mdev4991 The code is already functional, with the testsuite passing. Still missing is binlog purge, index crash recovery, async path, and general testing and cleanup. The basic idea is for each binlog file master-bin.000001 to write an index file master-bin.000001.idx. The index contains a B+-Tree in which the keys are pairs of (GTID state, binlog offset). A connecting slave's GTID position can be looked up quickly in the tree to find the corresponding binlog offset (and gtid_binlog_state) to start replicating at. Similarly, a non-GTID connecting slave can quickly look up its starting offset (BINLOG_GTID_POS()) to obtain the GTID position corresponding to its starting offset. The index is written out to disk concurrently with the writing of the binlog file. Since the tree is written in sorted order and append-only, the implementation is significantly simplified over a general B+-Tree, and can also be described as a Log-structed Merge Tree with a Btree search structure on top. All GTID state keys (except the first) in the index are delta-compressed, storing only the GTIDs that changed since the last record (which typically will be only one). Additionally the index is sparse, storing only say every 1-in-10 GTIDs in the index (this will be configurable of course). Using a sparse index, the disk space is reduced at the cost only of scanning a few extra events of the binlog file to find the exact position requested. Actual benchmarking of space usage is TBO, but we can expect a typical binlog file of 1 GB containing say 3 million GTIDs to require something like 10 MB of index disk space (1% increase). The file format is page-based (unlike the binlog file), so that it can be written efficiently and easily read using random-access. The index is written in parallel with the binlog file, but no extra fsync()s are needed (except one at the end of the file when the index is closed and synced to disk). The writing of the index will happen asynchroneously from the binlog background thread; this way it will have minimal impact on the performance and scalability of the binlog and the contested LOCK_log. Connecting slaves can read from the currently-being-written ("hot") GTID index by accessing internal memory buffers of pages not yet written to disk. If the server crashes, the existing binlog recovery and binlog checkpoint mechanism will be used to re-create any incomplete indexes as part of the normal recovery scan of binlog files. If somehow a binlog index should be found corrupt or missing (could be eg. old server upgrade or backup script that omits the new .idx files), the code will gracefully fall back to the old (slower) way of sequentially scanning the binlog file to locate the slave starting position. I think this will be a great (if rather late) improvement to one remaining corner of the GTID implementation that currently has sub-optimal performance, and I'm exited to get this completed and reviewed/tested and added to hopefully the next MariaDB release. - Kristian.