Hi Varun,
I was looking at the code for MDEV-21829, trying to understand whether there are any issues with how Unique object was extended to handle variable-size keys.
== TREE object (the RB tree) ==
TREE (the structure and functions that operate on it) support variable-sized keys, however the keys "are expected to know" their size.
tree_insert() has "uint key_size" argument. This way, tree_insert() knows how many bytes to allocate and copy when putting the key into the tree.
Well for the tree for fixed size we also set the size during the initialization phase init_tree() function has
Hi Sergey, On Sun, Sep 13, 2020 at 7:41 PM Sergey Petrunia <sergey@mariadb.com> wrote: tree->size_of_element= size > 0 ? (uint) size : 0; For Unique with fixed size keys we use init_tree to pass the fixed size and then we send 0 as the argument to key_size in tree_insert(). The allocated size used in tree_insert is: alloc_size=sizeof(TREE_ELEMENT)+key_size+tree->size_of_element;
The rest of the TREE code doesn't seem to care about the size of the keys.
When TREE code needs to compare a couple of keys, it will call TREE::compare callack. The callback only receives key pointers as arguments, that is, it is expected to infer key size from the contents of the key.
The same happens when when walking the tree. Only "uchar *key" is passed to tree_walk_action callback. The walk function has to figure out they key size on its own.
== Unique ==
Unique class cannot do the same what TREE functions function do. It needs to be aware of the sizes of the values it is storing: - It will need to pass value size to tree_insert() - It will need to know value size to write it into tmp file. - It will need to read it back when merging.
How does it know it? Unique::merge() and Unique::walk() with merge_walk() functions do make certain assumptions about the data format being used.
== The review points ==
Is this bad?
I think tight coupling between the Unique object and the specifics of the data format it is storing is bad. What is worse is that currently this coupling is implicit, it is not immediately apparent to somebody just looking at the code.
One important thing to keep in mind here is when Unique is flushed to disk then the code that does the merging of the different chunks of file and returning the results shares the code with filesort merge procedure. So I think either we move the code for Unique separately or we will have to make our data format be similar to the one used by packed sort keys for filesort. Currently the format for packed sort-keys is like: <total_key_length><keypart_1_null_byte><keypart_1_length><field_1_value>......................................................<field_N_value> where keypart_i_length is optional and only stored for packable key parts like CHAR, VARCHAR.
Possible ways out: 1. Explicitly declare in Unique class that variable-sized records must use certain data format.
2. Isolate the Unique from the specifics of the data format used.
2A. Let Unique get the size of the key as argument of unique_add() and then keep it together with the key. This creates some RAM/storage overhead.
2B. Pass a read_size() callback function which would get the size from the data?
The way packed keys are used is that the length is added as 4 bytes to the record in the beginning and then a static function is used to read its length whenever needed. Also I moved the implementation for packed keys in a separate class so that we can extend unique in the future too if needed.
Regards, Varun
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp