On Sat, May 20, 2023 at 11:07 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I agree that a function parameter seems simpler. We're requesting to be notified when a specific commit is durable in the engine, better specify _which_ commit than for InnoDB to guess, as you say :-).
Given that this code is only executed when switching binlog files (every 1 gigabyte or so of written binlog), or during RESET MASTER, the slight performance degradation due to the "unnecessary but sufficient" condition on InnoDB should not matter much.
We need to be careful about lifetime. The transaction may no longer exist as a THD or trx_t (?) in memory. But innobase_commit_ordered() could return the corresponding LSN, as was suggested in another mail, and then commit_checkpoint_request() could pass that value.
Right. The trx_t objects are being allocated from a special memory pool that facilitates fast reuse. The object of an old transaction could have been reused for something else. So, it would be better to let the storage engine somehow return its logical time. 64 bits for that could be sufficient for all storage engines, and the special value 0 could imply that the current logic (ask the storage engine to write everything) will be used. [snip]
Yes. single-engine transaction is surely the important usecase to optimise. It's nice if multi-engine transactions still work, but if they require multiple fsync still, I think that's perfectly fine, not something to allocate a lot of resources to optimise for.
It's nice that we agree here.
I also had the idea to use fibers/coroutines as is mentined in the MDEV description, but if that can be avoided, so much the better.
I too like https://quoteinvestigator.com/2011/05/13/einstein-simple/ or the KISS principle. Example: If the buf_pool.mutex or fil_system.mutex is a bottleneck, fix the bottlenecks (MDEV-15053, MDEV-23855) instead of introducing complex things such as multiple buffer pool instances and multiple page cleaner threads (removed in MDEV-15058), or introducing a Fil_shard (MySQL 8.0). Or if the log_sys.mutex is a bottleneck, do not introduce a "jungle of threads" that will write to multiple log files, but just reduce the bottlenecks with increased use of std::atomic or with a file format change that allows more clever locking (MDEV-27774). Thread context switches can be expensive when system calls such as mutex waits are involved, and when not, race conditions in lock-free algorithms are hard to diagnose (often invisible to tools like https://rr-project.org). Even when there are no system calls involved in inter-thread communication, "cache line ping-pong" can quickly become expensive, especially on NUMA systems. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc