Re: [Maria-developers] dfa33fb309d: MDEV-26664 Store UUIDs in a more efficient manner
Hi, Alexander! On Oct 19, Alexander Barkov wrote:
revision-id: dfa33fb309d (mariadb-10.6.1-84-gdfa33fb309d) parent(s): 5cfb31854a4 author: Alexander Barkov committer: Alexander Barkov timestamp: 2021-10-19 13:27:19 +0400 message:
MDEV-26664 Store UUIDs in a more efficient manner
UUID values
llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn
are now stored as
nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll
inside the record:
- the groups (segments separated by dash) are reordered right-to-left. - the bytes inside the groups are not reordered (stored as before, in big-endian format).
This provides a better sorting order: the earlier UUID was generated, the higher it appears in the ORDER BY output.
Also, this change enables a good key prefix compression, because the constant part is now in the beginning, while the non-constant part (the timestamp) is in the end.
diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid.result index 6a08ffe8c23..6e2e1cc87ef 100644 --- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid.result +++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid.result @@ -596,10 +596,10 @@ INSERT INTO t1 VALUES (CAST(CONCAT('2','0000000-0000-0000-0000-000000000003') AS SELECT * FROM t1 ORDER BY a; a 10000000-0000-0000-0000-000000000001 -10000000-0000-0000-0000-000000000002 -10000000-0000-0000-0000-000000000003 20000000-0000-0000-0000-000000000001 +10000000-0000-0000-0000-000000000002 20000000-0000-0000-0000-000000000002 +10000000-0000-0000-0000-000000000003 20000000-0000-0000-0000-000000000003
please, add a test for `ORDER BY CONCAT(a)` just to test the lexicographical ordering of UUIDs. Or may be `ORDER BY HEX(a)` ? Or `UNHEX` ? Whatever the recommended approach should be.
DROP TABLE t1; # diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result index 42bb74d4b01..eb7a51895b9 100644 --- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result +++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result @@ -25,7 +25,7 @@ a 00000000-0000-0000-0000-0000000000ff EXPLAIN SELECT * FROM t1 WHERE a='00000000-0000-0000-0000-0000000000ff'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 ref a a 17 const 2 Using where +1 SIMPLE t1 ref a a 17 const 4 Using where
why?
diff --git a/plugin/type_uuid/sql_type_uuid.h b/plugin/type_uuid/sql_type_uuid.h index 9076c874e98..be9fea8ebc9 100644 --- a/plugin/type_uuid/sql_type_uuid.h +++ b/plugin/type_uuid/sql_type_uuid.h @@ -24,8 +24,162 @@ class UUID: public FixedBinTypeStorage<MY_UUID_SIZE, MY_UUID_STRING_LENGTH> ... + // Compare two in-memory values + static int cmp(const LEX_CSTRING &a, const LEX_CSTRING &b) + {
this is way more LoC that I would've used. but ok, it's your code and it's a plugin, so as you like
diff --git a/sql/sql_string.h b/sql/sql_string.h index fe57c8153bb..795f80c3e08 100644 --- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -484,6 +484,11 @@ class Binary_string: public Sql_alloc if (str.Alloced_length) Alloced_length= (uint32) (str.Alloced_length - offset); } + LEX_CSTRING to_lex_cstring() const + { + LEX_CSTRING tmp= {Ptr, str_length}; + return tmp; + } inline LEX_CSTRING *get_value(LEX_CSTRING *res)
may be you could remove get_value()? In a separate commit. the name is quite bad and the method makes rather little sense, especially if there's to_lex_cstring() now.
{ res->str= Ptr; diff --git a/sql/sql_type_fixedbin.h b/sql/sql_type_fixedbin.h index c6e3d20bcfa..5141cb9fad4 100644 --- a/sql/sql_type_fixedbin.h +++ b/sql/sql_type_fixedbin.h @@ -154,18 +160,13 @@ class FixedBinTypeBundle FbtImpl::max_char_length()+1)); return false; } - int cmp(const char *str, size_t length) const - { - DBUG_ASSERT(length == sizeof(m_buffer)); - return memcmp(m_buffer, str, length); - } int cmp(const Binary_string &other) const
what is this one used for?
{ - return cmp(other.ptr(), other.length()); + return FbtImpl::cmp(FbtImpl::to_lex_cstring(), other.to_lex_cstring()); } int cmp(const Fbt &other) const { - return memcmp(m_buffer, other.m_buffer, sizeof(m_buffer)); + return FbtImpl::cmp(FbtImpl::to_lex_cstring(), other.to_lex_cstring()); } };
diff --git a/sql/sql_type_fixedbin_storage.h b/sql/sql_type_fixedbin_storage.h index 0e8ac81ab59..6e18335bd4c 100644 --- a/sql/sql_type_fixedbin_storage.h +++ b/sql/sql_type_fixedbin_storage.h @@ -21,6 +21,38 @@ format and their own (variable size) canonical string representation.
Examples are INET6 and UUID types. + + The MariaDB server uses three binary representations of a data type: + + 1. In-memory binary representation (user visible) + This representation: + - can be used in INSERT..VALUES (X'AABBCC') + - can be used in WHERE conditions: WHERE c1=X'AABBCC' + - is returned by CAST(x AS BINARY(N)) + - is returned by Field::val_native() and Item::val_native() + + 2. In-record binary representation (user invisible) + This representation: + - is used in records (is pointed by Field::ptr) + - must be comparable by memcmp() + + 3. Binlog binary (row) representation + Usually, for string data types the binlog representation + is based on the in-record representation with trailing byte compression: + - trailing space compression for text string data types + - trailing zero compression for binary string data types + + We have to have separate in-memory and in-record representations + because we use HA_KEYTYPE_BINARY for indexing. The engine API + does not have a way to pass a comparison function as a parameter. + + The default implementation below assumes that: + - the in-memory and in-record representations are equal + - the binlog representation is compatible with BINARY(N) + This is OK for simple data types, like INET6. + + Data type implementations that need different representations + can override the default implementation (like e.g. UUID does). */
thanks, this is very good
/***********************************************************************/ @@ -64,5 +137,37 @@ class FixedBinTypeStorage return true; }
+ static ulong KEY_pack_flags(uint column_nr) + { + /* + Return zero by default. A particular data type can override + this method return some flags, e.g. HA_PACK_KEY to enable + key prefix compression.
what if you change it later, will it make old tables corrupted?
+ */ + return 0; + }
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hello Sergei, Thanks for your review. I've pushed a new patch version to bb-10.7-bar-uuid. Please see comments inline: On 10/20/21 2:48 AM, Sergei Golubchik wrote:
20000000-0000-0000-0000-000000000002 +10000000-0000-0000-0000-000000000003 20000000-0000-0000-0000-000000000003
please, add a test for `ORDER BY CONCAT(a)` just to test the lexicographical ordering of UUIDs. Or may be `ORDER BY HEX(a)` ? Or `UNHEX` ? Whatever the recommended approach should be.
I propose to recommend using CAST(uuid AS BINARY(16)) for lexicographical order. This CAST only performs byte reordering (conversion from the in-record to in-memory format) While CONCAT and HEX additionally involve BINARY(16) -> VARCHAR(32) conversion through UUID::to_string(). The former should be faster.
DROP TABLE t1; # diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result index 42bb74d4b01..eb7a51895b9 100644 --- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result +++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result @@ -25,7 +25,7 @@ a 00000000-0000-0000-0000-0000000000ff EXPLAIN SELECT * FROM t1 WHERE a='00000000-0000-0000-0000-0000000000ff'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 ref a a 17 const 2 Using where +1 SIMPLE t1 ref a a 17 const 4 Using where
why?
Hash collision now happens differently, this affects the optimizer estimation for certain values. Note, UUID::hash_record() is NOT used by the HEAP engine, as it might seem. UUID::hash_record() is only used only by partitioning. So partitioning distribution has not changed, because Field_fbt::hash() uses UUID::hash_record(), which hashes bytes in the "user visible" (in-memory) order. I made this for partition distribution compatibility with the BINARY(16) data type. But HEAP distribution has changed, because HEAP still uses my_charset_bin.hash_sort() directly on the entire 16 byte in-record value, which is now different. We could preserve the old HEAP distribution by introducing a special CHARSET_INFO based on my_charset_bin, but overriding hash_sort(). But I think it's not really necessary. I checked on a bigger table, and it looks fine. For some constants it returns "2 rows". For some constants it returns "4 rows". So it did not change to worse. Just a coincidence.
diff --git a/plugin/type_uuid/sql_type_uuid.h b/plugin/type_uuid/sql_type_uuid.h index 9076c874e98..be9fea8ebc9 100644 --- a/plugin/type_uuid/sql_type_uuid.h +++ b/plugin/type_uuid/sql_type_uuid.h @@ -24,8 +24,162 @@ class UUID: public FixedBinTypeStorage<MY_UUID_SIZE, MY_UUID_STRING_LENGTH> ... + // Compare two in-memory values + static int cmp(const LEX_CSTRING &a, const LEX_CSTRING &b) + {
this is way more LoC that I would've used. but ok, it's your code and it's a plugin, so as you like
diff --git a/sql/sql_string.h b/sql/sql_string.h index fe57c8153bb..795f80c3e08 100644 --- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -484,6 +484,11 @@ class Binary_string: public Sql_alloc if (str.Alloced_length) Alloced_length= (uint32) (str.Alloced_length - offset); } + LEX_CSTRING to_lex_cstring() const + { + LEX_CSTRING tmp= {Ptr, str_length}; + return tmp; + } inline LEX_CSTRING *get_value(LEX_CSTRING *res)
may be you could remove get_value()? In a separate commit. the name is quite bad and the method makes rather little sense, especially if there's to_lex_cstring() now.
I am all for it. Hope Monty won't mind - he added get_value().
{ res->str= Ptr; diff --git a/sql/sql_type_fixedbin.h b/sql/sql_type_fixedbin.h index c6e3d20bcfa..5141cb9fad4 100644 --- a/sql/sql_type_fixedbin.h +++ b/sql/sql_type_fixedbin.h @@ -154,18 +160,13 @@ class FixedBinTypeBundle FbtImpl::max_char_length()+1)); return false; } - int cmp(const char *str, size_t length) const - { - DBUG_ASSERT(length == sizeof(m_buffer)); - return memcmp(m_buffer, str, length); - } int cmp(const Binary_string &other) const
what is this one used for?
It's just a convenience wrapper. It is used only in stored_field_cmp_to_item() for now.
+ static ulong KEY_pack_flags(uint column_nr) + { + /* + Return zero by default. A particular data type can override + this method return some flags, e.g. HA_PACK_KEY to enable + key prefix compression.
what if you change it later, will it make old tables corrupted?
I did not check. This method is used only in mysql_prepare_create_table(). Then flags are stored inside FRM. In theory the change should not make old tables corrupted. Should I try with different engines? Your question made me also think what should be the default. Probably we should do it the other way around: - enable compression by default - disable compression in INET6 What do you think?
+ */ + return 0; + }
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi, Alexander! On Oct 21, Alexander Barkov wrote:
On 10/20/21 2:48 AM, Sergei Golubchik wrote:
20000000-0000-0000-0000-000000000002 +10000000-0000-0000-0000-000000000003 20000000-0000-0000-0000-000000000003
please, add a test for `ORDER BY CONCAT(a)` just to test the lexicographical ordering of UUIDs. Or may be `ORDER BY HEX(a)` ? Or `UNHEX` ? Whatever the recommended approach should be.
I propose to recommend using CAST(uuid AS BINARY(16)) for lexicographical order. This CAST only performs byte reordering (conversion from the in-record to in-memory format)
While CONCAT and HEX additionally involve BINARY(16) -> VARCHAR(32) conversion through UUID::to_string().
The former should be faster.
Fine.
diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result index 42bb74d4b01..eb7a51895b9 100644 --- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result +++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result @@ -25,7 +25,7 @@ a 00000000-0000-0000-0000-0000000000ff EXPLAIN SELECT * FROM t1 WHERE a='00000000-0000-0000-0000-0000000000ff'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 ref a a 17 const 2 Using where +1 SIMPLE t1 ref a a 17 const 4 Using where
why?
Hash collision now happens differently, this affects the optimizer estimation for certain values. ... So it did not change to worse. Just a coincidence.
I hope so, although in all changed test results it did change to worse. Not always doubled, but always increased.
diff --git a/sql/sql_type_fixedbin.h b/sql/sql_type_fixedbin.h index c6e3d20bcfa..5141cb9fad4 100644 --- a/sql/sql_type_fixedbin.h +++ b/sql/sql_type_fixedbin.h + static ulong KEY_pack_flags(uint column_nr) + { + /* + Return zero by default. A particular data type can override + this method return some flags, e.g. HA_PACK_KEY to enable + key prefix compression.
what if you change it later, will it make old tables corrupted?
I did not check. This method is used only in mysql_prepare_create_table(). Then flags are stored inside FRM. In theory the change should not make old tables corrupted.
I think this too. Just thought it might be good to verify it. E.g., create a table, flip the flag, recompile and see if it opens.
Your question made me also think what should be the default. Probably we should do it the other way around: - enable compression by default - disable compression in INET6 What do you think?
Enable if NATIVE_LEN > 6? or >8 ? Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hello Sergei, Thanks, please find an updated patch version in: https://github.com/MariaDB/server/commits/bb-10.7-bar-uuid See comments inline: On 10/21/21 7:47 PM, Sergei Golubchik wrote:
diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result index 42bb74d4b01..eb7a51895b9 100644 --- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result +++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result @@ -25,7 +25,7 @@ a 00000000-0000-0000-0000-0000000000ff EXPLAIN SELECT * FROM t1 WHERE a='00000000-0000-0000-0000-0000000000ff'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 ref a a 17 const 2 Using where +1 SIMPLE t1 ref a a 17 const 4 Using where
why?
Hash collision now happens differently, this affects the optimizer estimation for certain values. ... So it did not change to worse. Just a coincidence.
I hope so, although in all changed test results it did change to worse. Not always doubled, but always increased.
I have added more tests into type_uuid_engines.inc, flipping all bytes (not just the last one), and with more total number of records. On this data set EXPLAIN stably returns "2 rows" in the test type_uuid_memory.
diff --git a/sql/sql_type_fixedbin.h b/sql/sql_type_fixedbin.h index c6e3d20bcfa..5141cb9fad4 100644 --- a/sql/sql_type_fixedbin.h +++ b/sql/sql_type_fixedbin.h + static ulong KEY_pack_flags(uint column_nr) + { + /* + Return zero by default. A particular data type can override + this method return some flags, e.g. HA_PACK_KEY to enable + key prefix compression.
what if you change it later, will it make old tables corrupted?
I did not check. This method is used only in mysql_prepare_create_table(). Then flags are stored inside FRM. In theory the change should not make old tables corrupted.
I think this too. Just thought it might be good to verify it. E.g., create a table, flip the flag, recompile and see if it opens.
I've added tests into type_uuid_myisam.test, examining INDEX_LENGTH/DATA_LENGTH ratio for three tables: - Made by CREATE TABLE (i.e. current, with HA_PACK_KEY enabled) - Restored from std_data/t1packkey.frm (with HA_PACK_KEY enabled) - Restored from std_data/t1nopackkey.frm (without HA_PACK_KEY) It opens t1nopackkey.frm without problems, and detects that compression is really disabled.
Your question made me also think what should be the default. Probably we should do it the other way around: - enable compression by default - disable compression in INET6 What do you think?
Enable if NATIVE_LEN > 6? or >8 ?
I guess the minimum length to enable compression is for the engine to decide. Note, Type_handler_string Type_handler_varchar Type_handler_blob_common return HA_PACK_KEY unconditionally. So if we want to enable compression by default, FixedBinTypeStorage::KEY_pack_flags() should possibly return HA_PACK_KEY unconditionally as well. On the other hand, it's not necessarily to decide now. We can collect more data types and decide later.
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (2)
-
Alexander Barkov
-
Sergei Golubchik