Welcome to the thread, Andrei!

For everybody: we just had a productive chat with Andrei, and I'd like to outline the results.


Let's take a look at binlog_commit, or more exactly MYSQL_BIN_LOG::trx_group_commit_leader 
called from there (follow binlog_commit_flush_trx_cache 
 -> THD::binlog_flush_pending_rows_event
 -> MYSQL_BIN_LOG::flush_and_set_pending_rows_event
 -> MYSQL_BIN_LOG::write_transaction_to_binlog call chain).

The committing is currently strongly sequenced. The transactions are organized into
groups, and when the latest transaction is acknowledged on the first commit phase,
they all are being committed by leader in a chosen order.

However, even here we can write the transactions in parallel, preserving the order.
Andrei also claims, that the order can be potentially restored on the replication side.

Anyway we technically can't send the transaction to the replication slave before the 
binlog flush&fsync, nevertheless the data will still be preserved in the volatile append cache.

There was MDEV-20925 to store the transaction length in the event, but it unfortunately
was rejected:
> I will be closing this issue because we have COMMIT/ROLLBACK query log event 
> in the end of transaction , whose size is difficult to determine , 
> So current plan is to do MDEv-19687 without transaction length.

The replication team decided to calculate the transaction sizes during receiving the data
form the io, and then to store it in the hash, and no protocol modifications would be required then.
I suggested to buffer the transaction separately, and then push it into the relay 
log in the data frame, storing the length. 

Then, we have MDEV-19687, which was the supertask for MDEV-20925.

The aim is to parallelize the replication on the slave side. They are implementing their
own parallel circular buffer with single-writer, multiple-reader use case:
the parallel workers are going to pick out the transaction data, and resolve the commit order
later.

Sergei petrunya, you've been questioning:
I see SEQ_READ_APPEND is only used for Relay Log on the slave. Afaiu, the relay
log has only only one producer, the network io thread. 

Andrei clarified, that there can actually be multiple sources. However they are just going to
create a working queue for each source.


===============================================================
To underline, the rationale is changed for this IO_CACHE improvement:

* The replication relay log is not the use-case anymore, since a separate, much simplified
circular buffer is going to be implemented (AFAIU, Sachin is in the middle of the progress
 of MDEV-19687). 

* The binlog commit is instead a good case! Most likely I have forgotten about it, when
I was writing the rationale. But anyway it wasn't clear for me, can the transactions be written
in parallel there. 
 On the reader side, the network replication sender should not block the committing process
by reading from the binlog.



Regards,
Nikita