This is somewhat better, but I think we should still clean up the code a little more. On Mon, Jan 23, 2017 at 1:32 PM, jan <jan.lindstrom@mariadb.com> wrote:
fil0fil.cc: Remove unneeded status call and add test is spare files and punch hole supported by file system where tablespace is created. Add call to get file system block size. Used file node is added to IORequest. Added functions to check is punch hole supported and setting punch hole.
s/is spare files/if sparse files/ s/where/when/
dberr.h: Add error code if punch hole operation fails.
I think that it would be better to explicitly mention the error code here. The name of the code matters more than the file name where it is declared.
os0file.h: Remove unneeded m_block size. Add m_bpage to know actual size of the block and m_fil_node to know tablespace file system block size and does it support punch hole.
Mention the class or struct name too. There could be multiple classes declared in the file os0file.h.
os0file.cc: Add function to do punch_hole operation, get the file system block size and determine does file system support sparse files (for punch hole).
What is the name of the added function?
+--disable_warnings +let $innodb_compression_algorithm_orig=`SELECT @@innodb_compression_algorithm`; +let $innodb_file_format_orig = `SELECT @@innodb_file_format`; +let $innodb_file_per_table_orig = `SELECT @@innodb_file_per_table`; + +SET GLOBAL innodb_file_format = `Barracuda`; +SET GLOBAL innodb_file_per_table = ON; +--enable_warnings +--enable_query_log
In 10.2, do not add unnecessary references to the deprecated variable innodb_file_format, and do not bother to mess with innodb_file_per_table. The defaults are fine as is.
+let $wait_condition= SELECT variable_value > 0 FROM information_schema.global_status WHERE variable_name = 'innodb_num_pages_page_compressed'; +--source include/wait_condition.inc + +let $wait_condition= SELECT variable_value > 0 FROM information_schema.global_status WHERE variable_name = 'innodb_num_page_compressed_trim_op'; +--source include/wait_condition.inc + +SELECT VARIABLE_VALUE > 0 FROM information_schema.global_status where variable_name = 'INNODB_NUM_PAGES_PAGE_COMPRESSED'; +SELECT VARIABLE_VALUE > 0 FROM information_schema.global_status where variable_name = 'INNODB_NUM_PAGE_COMPRESSED_TRIM_OP';
Can we wait for a single condition here, using AND in the WHERE condition? I do not think that we need the subsequent SELECT statements.
+/** +Should we punch hole to deallocate unused portion of the page. +@param[in] bpage Page control block +@return true if punch hole should be used, false if not */ +bool +buf_page_should_punch_hole( + const buf_page_t* bpage) +{ + return (bpage && bpage->real_size != bpage->size.physical()); +} + +/** +Calculate the length of trim (punch_hole) operation. +@param[in] bpage Page control block +@param[in] write_length Write length +@return length of the trim or zero. */ +ulint +buf_page_get_trim_length( + const buf_page_t* bpage, + ulint write_length) +{ + return (bpage ? bpage->size.physical() - write_length : 0); +}
Please declare these as methods of buf_page_t, and move the NULL check to the caller.
@@ -912,7 +912,7 @@ buf_dblwr_write_block_to_datafile( type |= IORequest::DO_NOT_WAKE; }
- IORequest request(type); + IORequest request(type, bpage);
/* We request frame here to get correct buffer in case of encryption and/or page compression */ @@ -924,7 +924,7 @@ buf_dblwr_write_block_to_datafile( fil_io(request, sync, bpage->id, bpage->size, 0, bpage->size.physical(), (void*) frame, - (void*) bpage, NULL); + (void*) bpage); } else { ut_ad(!bpage->size.is_compressed());
It feels redundant that we are passing bpage both in the IORequest request and directly to the fil_io() call. One parameter should be enough. And there should be no need for the parameters that are derived directly from bpage. Can we have a separate fil_io() call for the redo log, and a separate one for buffer pool blocks?
+/** +Get should we punch hole to tablespace. +@param[in] node File node +@return true, if punch hole should be tried, false if not. */ +bool +fil_node_should_punch_hole( + const fil_node_t* node) +{ + return (node && node->space && node->space->punch_hole); +}
Who would call this with node=NULL? And when would node->space ever be NULL for a valid fil_node_t?
+/** +Set punch hole to tablespace to given value. +@param[in] node File node +@param[in] val value to be set. */ +void +fil_space_set_punch_hole( + fil_node_t* node, + bool val) +{ + if (node && node->space) { + node->space->punch_hole = val; + } +}
Also here, I would remove the unnecessary if condition.
diff --git a/storage/innobase/include/os0api.h b/storage/innobase/include/os0api.h new file mode 100644 index 00000000000..ea2a113bdec --- /dev/null +++ b/storage/innobase/include/os0api.h
Instead of creating a new file, can we please move these declarations into an existing file, such as fsp0types.h? The file name prefix is misleading, because there is nothing operating system dependent about the contents.
diff --git a/storage/innobase/include/os0file.h b/storage/innobase/include/os0file.h index 57ee015dfdd..cd016a46972 100644 --- a/storage/innobase/include/os0file.h +++ b/storage/innobase/include/os0file.h @@ -36,7 +36,8 @@ Created 10/21/1995 Heikki Tuuri #ifndef os0file_h #define os0file_h
-#include "univ.i" +#include "page0size.h" +#include "os0api.h"
#ifndef _WIN32 #include <dirent.h> @@ -46,8 +47,10 @@ Created 10/21/1995 Heikki Tuuri
/** File node of a tablespace or the log data space */ struct fil_node_t; +struct fil_space_t;
extern bool os_has_said_disk_full; +extern my_bool srv_use_trim;
/** Number of pending read operations */ extern ulint os_n_pending_reads;
Instead of duplicating declarations, please add #include directives.
@@ -177,6 +180,8 @@ static const ulint OS_FILE_ERROR_MAX = 200; #define IORequestLogRead IORequest(IORequest::LOG | IORequest::READ) #define IORequestLogWrite IORequest(IORequest::LOG | IORequest::WRITE)
+ + /** The IO Context that is passed down to the low level IO code */ class IORequest {
Do not add empty lines.
+ /** @return true if punch hole should be used */ + bool punch_hole() const + MY_ATTRIBUTE((warn_unused_result)) + { + return((m_type & PUNCH_HOLE) == PUNCH_HOLE); + } +
Comparing against 0 should be more efficient.
- /** @return the block size to use for IO */ - ulint block_size() const - MY_ATTRIBUTE((warn_unused_result)) + /** Set the pointer to file node for IO + @param[in] node File node */ + void set_fil_node(fil_node_t* node)
Consider removing this method, and setting this in the constructor. Why fil_node_t and not fil_space_t?
+ /** @return true if punch hole is supported */ + static bool is_punch_hole_supported() + { + + /* In this debugging mode, we act as if punch hole is supported, + and then skip any calls to actually punch a hole here. + In this way, Transparent Page Compression is still being tested. */ + DBUG_EXECUTE_IF("ignore_punch_hole", + return(true); + );
Please add a test that uses SET DEBUG_DBUG='+d,ignore_punch_hole'.
+#if defined(HAVE_FALLOC_PUNCH_HOLE_AND_KEEP_SIZE) || defined(_WIN32) + return(true); +#else + return(false); +#endif /* HAVE_FALLOC_PUNCH_HOLE_AND_KEEP_SIZE || _WIN32 */ + }
This looks wrong design to me. IORequest::is_punch_hole_supported() should depend on the fil_node_t and the actual file system capabilities.
+ void space_no_punch_hole(void) { + fil_space_set_punch_hole(m_fil_node, false); + }
I would prefer more straight code, such as m_fil_node->punch_hole = false; or m_fil_node->clear_punch_hole();
+/** Check if the file system supports sparse files. + +Warning: On POSIX systems we try and punch a hole from offset 0 to +the system configured page size. This should only be called on an empty +file. + +Note: On Windows we use the name and on Unices we use the file handle. + +@param[in] name File name +@param[in] fh File handle for the file - if opened +@return true if the file system supports sparse files */ +bool +os_is_sparse_file_supported( + const char* path, + os_file_t fh) + MY_ATTRIBUTE((warn_unused_result));
Is it possible to use the file handle on Windows? Or can we define two different functions, one for Windows and another for POSIX?
+/** Free storage space associated with a section of the file. +@param[in] fh Open file handle +@param[in] off Starting offset (SEEK_SET) +@param[in] len Size of the hole +@return DB_SUCCESS or error code */ +dberr_t +os_file_punch_hole( + IORequest& type, + os_file_t fh, + os_offset_t off, + os_offset_t len) + MY_ATTRIBUTE((warn_unused_result)); + +/** Free storage space associated with a section of the file. +@param[in] fh Open file handle +@param[in] off Starting offset (SEEK_SET) +@param[in] len Size of the hole +@return DB_SUCCESS or error code */ +dberr_t +os_file_punch_hole( + os_file_t fh, + os_offset_t off, + os_offset_t len) + MY_ATTRIBUTE((warn_unused_result));
The parameter 'type' of the first function is not defined. Why do we need two very similar-looking functions? Could one of them be defined inline as a wrapper of the other?
@@ -156,9 +157,6 @@ class page_size_t {
private:
- /* Disable implicit copying. */ - void operator=(const page_size_t&); - /* For non compressed tablespaces, physical page size is equal to the logical page size and the data is stored in buf_page_t::frame (and is also always equal to univ_page_size (--innodb-page-size=)).
Also reimplement page_size_t::copy_from() as { *this = src; }
@@ -737,6 +740,19 @@ os_file_handle_error_no_exit( const char* operation, bool silent);
+/** Punch a hole in the file if it was a write +@param[in] type IO context +@param[in] fh Open file handle +@param[in] len Compressed buffer length for write +@return DB_SUCCESS or error code */ +static +dberr_t +os_file_io_complete( + IORequest& type, + os_file_t fh, + ulint offset, + ulint len);
Can this be made a method of IORequest?
+ if (err != 0) { + ib::warn() << "fstat() failed on file " << name + << " error "<< errno << " : " << strerror(errno); + os_file_handle_error_no_exit(name, "fstat()", FALSE);
Remove the ib::warn(). One error report should be enough, just like we would do it on Windows.
+ ib::warn() + << "fallocate(" << fh + <<", FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, " + << off << ", " << len << ") returned errno: " + << errno;
Do we need this detailed reporting? What use is the numeric value of the file handle to the user? A file name would be much more useful. And ideally, this error should be pushed to the client.
+ /* Deallocate unused blocks from file system. + This is newer done to page 0 or to log files.*/
s/newer/never/. However, during .ibd file creation, we do punch a hole to page 0.
+ if (offset > 0 + && !type.is_log() + && type.is_write() + && type.punch_hole()) { + *err = os_file_io_complete( + const_cast<IORequest&>(type), + file, + static_cast<ulint>(offset), + n);
Remove the const_cast, and properly change all IORequest parameters to non-const.
@@ -4680,7 +4913,7 @@ os_file_pwrite( (void) my_atomic_addlint(&os_n_pending_writes, 1); MONITOR_ATOMIC_INC(MONITOR_OS_PENDING_WRITES);
- ssize_t n_bytes = os_file_io(type, file, const_cast<void*>(buf), + ssize_t n_bytes = os_file_io(type, file, (void*)buf, n, offset, err);
Avoid C-style casts at any cost. Can we split the os_file_io() into os_file_read() and os_file_write() somehow? How much common code is there actually between the two? Also, could that function be made a method of IORequest?
@@ -5195,6 +5429,31 @@ os_file_read_no_error_handling_func( +/** NOTE! Use the corresponding macro os_file_write(), not directly +Requests a synchronous write operation. +@param[in] type IO flags +@param[in] file handle to an open file +@param[out] buf buffer from which to write +@param[in] offset file offset from the start where to read +@param[in] n number of bytes to read, starting from offset +@return DB_SUCCESS if request was successful, false if fail */ +dberr_t +os_file_write_func( + IORequest& type, + const char* name, + os_file_t file, + const void* buf, + os_offset_t offset, + ulint n)
Can this be made a method of IORequest?
diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc index 5d478e4529f..4b862cc6267 100644 --- a/storage/innobase/srv/srv0srv.cc +++ b/storage/innobase/srv/srv0srv.cc @@ -184,7 +184,7 @@ my_bool srv_use_native_aio = TRUE; my_bool srv_numa_interleave = FALSE; /* If this flag is TRUE, then we will use fallocate(PUCH_HOLE) to the pages */ -UNIV_INTERN my_bool srv_use_trim = FALSE; +UNIV_INTERN my_bool srv_use_trim = TRUE; /* If this flag is TRUE, then we disable doublewrite buffer */ UNIV_INTERN my_bool srv_use_atomic_writes = FALSE; /* If this flag IS TRUE, then we use this algorithm for page compressing the pages */
Omit the redundant initialization and let the variable live in the BSS segment. Global parameters will be initialized via the table in ha_innodb.cc. -- DON’T MISS M|17 April 11 - 12, 2017 The Conrad Hotel New York City https://m17.mariadb.com/ Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation