On Tue, Jan 10, 2017 at 11:49 AM, jan <jan.lindstrom@mariadb.com> wrote:
revision-id: c32bc1afe820aa90c2e2666a77ce2bb43019df3d (mariadb-10.2.3-43-gc32bc1afe82) parent(s): 3d46768da2a784ddc9c341d1fb03468525bd38f1 author: Jan Lindström committer: Jan Lindström timestamp: 2017-01-10 11:36:42 +0200 message:
MDEV-11254: innodb-use-trim has no effect in 10.2
In MySQL 5.7 InnoDB os0file.[h|cc] was refactored making trim or punch hole operation not work with MariaDB.
buf0buf.cc: buf_page_is_corrupted() remember that compressed pages do not contain checksum and FIL_TAIL header.
What is a FIL_TAIL header? I suppose it is referring to the 8 bytes FIL_PAGE_DATA_END at the end of the page frame? And which compressed pages? ROW_FORMAT=COMPRESSED pages do contain a page footer.
buf0dblwr.cc: Set up the page size and write size to IORequest and set up punch_hole() if actual write is not physical size.
Actual write to where? Apparently (looking at the code), it is to the user data file, not to the doublewrite buffer. The commit message should mention this important detail. (I think we should not punch holes to the doublewrite buffer area (pages 64..191 in the system tablespace).)
buf0rea.cc, fil0fil.cc, fil0fil.h: Remove unnecessary parameter write_size from os_file_write, fil_io, os_aio functions.
I see that this parameter does not exist in MySQL 5.7. OK to remove.
os0file.h, os0file.cc: Implement new version of os_file_io_complete to do punch_hole operation and punch hole operation should be based on file system block size determined using fstat or DeviceIOControl.
Please also mention the InnoDB wrapper function os_file_get_block_size(), which was added. But, where is that function being called? Did you forget to adjust fil_node_open_file()? Please submit a revised patch for review. Some detailed comments follow.
+set autocommit=0; +call innodb_insert_proc(16000); +commit; +set autocommit=1;
Do we really have to insert this many records? What would be the minimum test case? Did you test with innodb_page_size=4k or 8k? Consider removing the two SET statements and adding a BEGIN before the CALL instead. That would be more standard SQL.
+--disable_query_log +--disable_warnings +EVAL SET GLOBAL innodb_compression_algorithm = $innodb_compression_algorithm_orig; +EVAL SET GLOBAL innodb_file_per_table = $innodb_file_per_table_orig; +EVAL SET GLOBAL innodb_file_format = $innodb_file_format_orig; +--enable_warnings +--enable_query_log
There is no need to assign innodb_file_per_table or innodb_file_format in 10.2. The defaults are just fine. If you keep adding these statements in 10.2, merging to 10.3 will require extra work after 10.3 removes these deprecated parameters (copying WL#7704 from MySQL 8.0.0).
+ bool page_encrypted = false; + bool page_compressed = false; + ulint page_type = mach_read_from_2(read_buf+FIL_PAGE_TYPE); + + page_compressed = (page_type == FIL_PAGE_PAGE_COMPRESSED_ENCRYPTED + || page_type == FIL_PAGE_PAGE_COMPRESSED);
Please remove the initialization of page_compressed to false. Could we fuse page_compressed into page_encrypted to simplify those conditions that currently check for page_compressed&&page_encrypted?
@@ -978,9 +978,13 @@ buf_dblwr_write_block_to_datafile( ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE); buf_dblwr_check_page_lsn(block->frame);
+ if (bpage->real_size != bpage->size.physical()) { + request.set_punch_hole(); + } + fil_io(request, - sync, bpage->id, bpage->size, 0, bpage->size.physical(), - frame, block, (ulint *)&bpage->write_size); + sync, bpage->id, bpage->size, 0, bpage->real_size, + frame, block); } }
I understand that this is the write to the final location. Sure, we should punch the hole there.
diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc index b82c4db18ad..de534cc6e89 100644 --- a/storage/innobase/buf/buf0flu.cc +++ b/storage/innobase/buf/buf0flu.cc @@ -1093,11 +1093,15 @@ buf_flush_write_block_low(
ulint type = IORequest::WRITE | IORequest::DO_NOT_WAKE;
- IORequest request(type); + IORequest request(type, bpage->size, (ulint *)&bpage->write_size); + + if (bpage->real_size != bpage->size.physical()) { + request.set_punch_hole(); + }
Could we pass bpage as a parameter to the IORequest::IORequest(), and also include the set_punch_hole() call there? I think that buf_page_t is already too big, and we should not bloat it with a 64-bit field for write_size. The reason that I separated buf_page_t from buf_block_t was that for ROW_FORMAT=COMPRESSED, the block descriptors would use less space. Do we even need write_size in buf_page_t? I think it would better belong to buf_block_t. That is because ROW_FORMAT=COMPRESSED is never used together with hole-punching.
@@ -37,6 +37,7 @@ Created 10/21/1995 Heikki Tuuri #define os0file_h
#include "univ.i" +#include "page0size.h"
#include "page0size.h" also does #include "univ.i"
#ifndef _WIN32 #include <dirent.h> @@ -554,6 +555,8 @@ class IORequest { /** Default constructor */ IORequest() : + m_page_size(univ_page_size), + m_write_size(NULL), m_block_size(UNIV_SECTOR_SIZE), m_type(READ), m_compression()
We only have to add a single buf_page_t* const parameter. Maybe we could also remove m_block_size?
+ /** + @param[in] type Request type, can be a value that is + ORed from the above enum + @param[in] page_size Page size + @param[in] write_size Actual write size */ + IORequest(ulint type, + const page_size_t& page_size, + ulint* write_size) + : + m_page_size(page_size), + m_write_size(write_size), + m_block_size(UNIV_SECTOR_SIZE), + m_type(static_cast<uint16_t>(type)), + m_compression() + { + if (is_log()) { + disable_compression(); + } + + if (!is_punch_hole_supported()) { + clear_punch_hole(); + } + }
I do not think we need this constructor.
+ /** operator = + @param[in] from Source to copy */ + IORequest& operator=(const IORequest& from) + { + m_page_size.copy_from(from.m_page_size); + m_write_size = from.m_write_size; + m_block_size = from.m_block_size; + m_type = from.m_type; + m_compression = from.m_compression; + return (*this); + }
What is wrong with the compiler-generated assignment operator? If it complains that there is no page_size_t::operator=(), we can easily let it be generated by the compiler. What is this assignment needed for?
+ /** Copy constructor. + @param[in] from source. */ + IORequest(const IORequest& from) + : m_page_size(from.m_page_size.physical(), + from.m_page_size.logical(), + from.m_page_size.is_compressed()), + m_write_size(from.m_write_size), + m_block_size(from.m_block_size), + m_type(from.m_type), + m_compression(from.m_compression) + { + } +
What is this needed for? Why cannot the compiler generate this? As far as I can tell, this is just a bitwise copy.
@@ -688,6 +742,22 @@ class IORequest { m_block_size = static_cast<uint32_t>(block_size); }
+ /** Set the write size variable for later use. + @param[in] write_size Write size variable */ + void write_size(ulint* write_size) + { + m_write_size = write_size; + } + + /** Set the actual write size done in IO + @param[in] write_size Write size to set */ + void write_size(ulint write_size) + { + if (m_write_size) { + *m_write_size = write_size; + } + } +
The two methods do two completely different things, and at the very least they should have distinctive names. Do we need the first method at all?
+/***********************************************************************//** +Try to get number of bytes per sector from file system. +@return file block size */ +UNIV_INTERN +ulint +os_file_get_block_size( +/*===================*/ + os_file_t file, /*!< in: handle to a file */ + const char* name); /*!< in: file name */
Do not use /*****/ or /*====*/ or /*!< comments in new code. This function appears to be unused.
-/** Decompress after a read and punch a hole in the file if it was a write +/** Punch a hole in the file if it was a write @param[in] type IO context @param[in] fh Open file handle -@param[in,out] buf Buffer to transform -@param[in,out] scratch Scratch area for read decompression -@param[in] src_len Length of the buffer before compression -@param[in] len Compressed buffer length for write and size - of buf len for read +@param[in] len Compressed buffer length for write @return DB_SUCCESS or error code */ static dberr_t os_file_io_complete( - const IORequest&type, + IORequest& type, os_file_t fh, - byte* buf, - byte* scratch, - ulint src_len, ulint offset, ulint len);
Why is the const qualifier removed?
@@ -832,6 +825,69 @@ os_aio_simulated_handler( void** m2, IORequest* type);
+/***********************************************************************//** +Try to get number of bytes per sector from file system. +@return file block size */ +UNIV_INTERN +ulint +os_file_get_block_size( +/*===================*/ + os_file_t file, /*!< in: handle to a file */ + const char* name) /*!< in: file name */ +{ + ulint fblock_size = 512; + +#if defined(UNIV_LINUX) + struct stat local_stat; + int err; + + err = fstat((int)file, &local_stat); + + if (err != 0) { + ib::warn() << "fstat() failed on file " << name + << " error "<< errno << " : " << strerror(errno); + os_file_handle_error_no_exit(name, "fstat()", FALSE); + } else { + fblock_size = local_stat.st_blksize; + } +#endif /* UNIV_LINUX */
Why only Linux? fstat() is already in POSIX.1-2001.
@@ -953,8 +1009,8 @@ class AIOHandler { ut_a(slot->type.is_read() || !slot->skip_punch_hole);
return(os_file_io_complete( - slot->type, slot->file, slot->buf, - slot->compressed_page, slot->original_len, + const_cast<IORequest&>(slot->type), + slot->file, static_cast<ulint>(slot->offset), slot->len)); }
Please never use const_cast in a context where the object can be modified. It is unclear to me why we would need to lose the constness of IORequest here. (The members of IORequest do not need changes. Writes through pointers in IORequest are allowed even if IORequest itself is const.)
@@ -1659,85 +1715,78 @@ os_file_read_string( + ulint amount = trim_len / type.block_size(); + switch(type.block_size()) { + case 512: + srv_stats.page_compression_trim_sect512.add(amount); + break;
Can we store the log2 of the block_size somewhere and use an array of compression attributes, to avoid so much repeated code?
@@ -4252,6 +4300,7 @@ os_file_punch_hole_win32( fh, FSCTL_SET_ZERO_DATA, &punch, sizeof(punch), NULL, 0, &temp, NULL);
+ return(!result ? DB_IO_NO_PUNCH_HOLE : DB_SUCCESS); }
One empty line is enough. Marko -- DON’T MISS M|17 April 11 - 12, 2017 The Conrad Hotel New York City https://m17.mariadb.com/ Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation