Marko Mäkelä <marko.makela@mariadb.com> writes:
On Fri, Aug 25, 2023 at 10:16 AM Kristian Nielsen via commits <commits@lists.mariadb.org> wrote:
Checksums cannot be pre-computed when binlog encryption is enabled, as
Have you thought about a format change that would address these issues?
What I though about was how to pre-encrypt without a format change. Currently, binlog encryption encrypts each event separately using a nonce built from the binlog file (12 random bytes) and the event (4 bytes of end_log_pos). The end_log_pos is easy, we can just put an increasing counter in there, and then fix the value after decryption. The problem is the per-binlog-file nonce. If a binlog rotation happens between transaction start and commit, we will have encrypted with the wrong nonce. In this case we can decrypt and re-encrypt, assuming most transactions will not span a binlog rotation. Or we could rotate this part of the nonce less often (I don't think anything says it _has_ to be for each binlog rotation). Binlog encryption is not something that I find very useful or interesting. It's one of those features that hurt more than it helps. If it was not there, users would be happy. Now that it's there, they will think they need to use it, and suffer complexity and limitations. Full-disk encryption seems much simpler, more robust, and more complete. But with this patch series, pre-encryption shouldn't be too hard to add. I think it needs input from whoever thinks binlog encryption is useful, since their requirements for the nonce will be needed to decide how to implement it.
In MDEV-14425 https://github.com/MariaDB/server/commit/685d958e38b825ad9829be311f26729cccf... I redesigned the InnoDB log file format in a way that allows checksums to be computed before the final position of the records (the log sequence number of LSN) is known. For encryption, an additional 64-bit nonce will be written; this currently is the LSN at the time the
In binlog we already have the end_log_pos, which is mostly an unused field. It's 32 bit, not sure if that would be considered sufficient for a nonce.
Thanks to MDEV-27774, not only InnoDB computes checksums while not holding any mutex, but multiple threads can copy their mini-transactions to the shared log_sys.buf while holding a shared
For the binlog, the stmt/trx caches are written directly to the binlog file under LOCK_log, from a single thread serving multiple pending transactions. I somehow doubt that this is a current scalability bottleneck, something that would be interesting to investigate though. It also seems doubtful that the kernel would scale with multiple threads appending to a file concurrently. The binlog is a very naive implementation and not well designed for a scalable transaction log currently. I think the InnoDB redo log is somewhat different here.
+/* Convenience macros since ulong is 32 or 64 bit depending on platform. */ +#if SIZEOF_LONG == 4 +#define my_atomic_storeul(P, D) my_atomic_store32((int32 volatile *)(P), (D))
In all currently supported MariaDB Server versions, C++11 is available. Is there a particular reason why you would not use std::atomic<size_t> here?
Yes. The binlog_checksum_options variable needs to be of ulong type to be usable with the MYSQL_SYSVAR_ENUM() macro to define the --binlog-checksum configuration option. As far as I could determine, std::atomic does not provide any non-atomic access to the underlying storage location to obtain the ulong * required. The alternative would be to change the implementation of the --binlog-checksum sysvar to somehow allow a std::atomic as the backing store. I thought this was more complex and out of scope of this task, but don't have a strong opinion one way or the other. (The ulong type and sys_var was always a source of grief, with ulong being different size on different platforms and so on). - Kristian.