The handler API in MySQL and MariaDB is operating on a single record at a time. There are some ‘batched’ primitives that in my understanding were developed for the NDB cluster, but generally InnoDB does not use them.
The only nontrivial thing that InnoDB does during reads is the Index Condition Pushdown (ICP) and the end-of-range detection (DsMrr) that was ported from the never-released MySQL 6.0 or 5.2 to MySQL 5.6.
To improve the performance of range scans, InnoDB uses a caching mechanism to prefetch results. Starting with the 4th consecutive read on the same ha_innobase handle, InnoDB would prefetch 8 records into row_prebuilt_t::fetch_cache. There are several drawbacks of this:
- The parameters of this cache are hard-coded, and these caches can make wrong guesses (for SELECT * FROM t LIMIT 5, it would unnecessarily read 8 records, possibly causing unnecessary page accesses)
- The caches waste memory, especially on partitioned tables, where each partition handle would have its own cache. (The main benefit of ha_innopart or InnoDB native partitioning in MySQL 5.7 is that the fetch_cache of the table is shared between partitions. It is only a benefit of saving memory, because the cache will be emptied when switching partitions.)
As far as I understand, the entire purpose of the
prebuilt->fetch_cache is to reduce the cost of acquiring the page
latch and repositioning the cursor on each read request. If we did not
call mtr_t::commit() between each request, maybe we could remove this
cache altogether.
When I was in the InnoDB team at MySQL, I did bring up a few times the idea of eliminating the fetch_cache, but the
optimizer developers were in a separate team, and to do
this, we would have needed support from both managers. I am hoping that with MariaDB this would merely be a technical challenge to first write a
proof-of-concept prototype, and then to polish and test it.
When I discussed this with the late Olav Sandstå some years ago, there was some thinking that we should stop hard-coding the TABLE::record[0] in various parts of the code, and the storage engine should be able to return multiple records to a larger TABLE::record[] buffer. I wonder if a simpler approach would work, and if we could do without any prefetching: