Hi Sergey,



On Sun, Sep 13, 2020 at 7:41 PM Sergey Petrunia <sergey@mariadb.com> wrote:
Hi Varun,

I was looking at the code for MDEV-21829, trying to understand whether there
are any issues with how Unique object was extended to handle variable-size
keys.

== TREE object (the RB tree) ==

TREE (the structure and functions that operate on it) support variable-sized
keys, however the keys "are expected to know" their size.

tree_insert() has "uint key_size" argument. This way, tree_insert() knows
how many bytes to allocate and copy when putting the key into the tree.

Well for the tree for fixed size we also set the size during the initialization phase
init_tree() function has 
   tree->size_of_element= size > 0 ? (uint) size : 0;

For Unique with fixed size keys we use init_tree to pass the fixed size and then we send 0 as the argument
to key_size in tree_insert().
The allocated size used in tree_insert is: 
   alloc_size=sizeof(TREE_ELEMENT)+key_size+tree->size_of_element;
 
The rest of the TREE code doesn't seem to care about the size of the keys.

When TREE code needs to compare a couple of keys, it will call TREE::compare
callack. The callback only receives key pointers as arguments, that is, it is
expected to infer key size from the contents of the key.

The same happens when when walking the tree. Only "uchar *key" is passed to
tree_walk_action callback. The walk function has to figure out they key size
on its own.

== Unique ==

Unique class cannot do the same what TREE functions function do. It needs to be
aware of the sizes of the values it is storing:
- It will need to pass value size to tree_insert()
- It will need to know value size to write it into tmp file.
- It will need to read it back when merging.

How does it know it?
Unique::merge() and Unique::walk() with merge_walk() functions do make certain
assumptions about the data format being used.

== The review points ==

Is this bad?

I think tight coupling between the Unique object and the specifics of the
data format it is storing is bad. What is worse is that currently this coupling
is implicit, it is not immediately apparent to somebody just looking at the
code.

One important thing to keep in mind here is when Unique is flushed to disk then the code that does the merging of the different chunks of file and returning the results shares the code with filesort merge procedure.
So I think either we move the code for Unique separately or we will have to make our data format be similar to the one used by packed sort keys
for filesort.

Currently the format for packed sort-keys is like:

<total_key_length><keypart_1_null_byte><keypart_1_length><field_1_value>......................................................<field_N_value>

where keypart_i_length is optional and only stored for packable key parts like CHAR, VARCHAR.

 
Possible ways out:
1. Explicitly declare in Unique class that variable-sized records must use
  certain data format. 

2. Isolate the Unique from the specifics of the data format used.

2A. Let Unique get the size of the key as argument of unique_add() and then
  keep it together with the key. This creates some RAM/storage overhead.

2B. Pass a read_size() callback function which would get the size from the
data?

 The way packed keys are used is that the length is added as 4 bytes to the record
in the beginning and then a static function is used to read its length whenever needed.
Also I moved the implementation for packed keys in a separate class so that we can extend unique in the future too if needed.


Regards,
Varun
 
BR
 Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog



_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp