Re: [Maria-developers] Sachin weekly report

Hi Sergei! Actually I completed the work on update and delete. Now they will use index for looking up records. But I am thinking I have done a lot of changes in optimizer which may break it , and also there are lots of queries where my code does not work, fixing this might take a long amount of time. I am thinking of a change in my existing code :- Suppose a table t1 create table t1 (a blob, b blob, c blob, unique(a,b,c)); In current code , for query like there will a KEY with only one keypart which points to field DB_ROW_HASH_1. It was okay for normal updates , insert and delete , but in the case of where optimization I have do a lot of stuff , first to match field (like in add_key_part), then see whether all the fields in hash_str are present in where or not, then create keys by calculating hash. I do this by checking the HA_UNIQUE_HASH flag in KEY , but this also makes (I think) optimizer code bad because of too much dependence. Also I need to patch get_mm_parts and get_mm_leaf function , which I think should not be patched. I am thinking of a another approach to this problem at server level instead of having just one keypart we can have 1+3 keypart. Last three keypart will be for field a, b, c and first one for DB_ROW_HASH_1 .These will be only at server level not at storage level. key_info->key_part will point at keypart containing field a , while key_part having field DB_ROW_HASH_1 will -1 index. By this way I do not have to patch more of optimizer code. But there is one problem , what should be the length of key_part? I am thinking of it equal to field->pack_length(), this would not work because while creating keys optimizer calls get_key_image() (which is real data so can exceed pack_lenght() in case of blob), so to get this work I have to patch optimizer where it calls get_key_image() and see if key is HA_UNIQUE_HASH . If yes then instead of get_key_image just use memcpy(key, field->ptr(), field->pack_length()); this wont copy the actual data, but we do not need actual data. I will patch handler methods like ha_index_read, ha_index_idx_read , multi_range_read_info_const basically handler methods which are related to index or range search. In these methods i need to calculate hash , which I can calculate from key_ptr but key_ptr doe not have actual data(in case of blobs etc).So to get the date for hash , I will make a field clone of (a,b,c etc) but there ptr will point in key_ptr. Then field->val_str() method will work simply and i can calculate hash. And also I can compare returned result with actual key in handler method itself. What do you think of this approach ? Regards sachin On Sat, Aug 20, 2016 at 11:16 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On Aug 19, Sachin Setia wrote:
On Fri, Aug 19, 2016 at 2:42 PM, Sergei Golubchik <serg@mariadb.org> wrote:
First. I believe you'll need to do your final evaluation soon, and it will need to have a link to the code. Did you check google guidelines about it? Is everything clear there? Do you need help publishing your work in a format that google requires?
They don't accept delays for any reasons, so even if your code is not 100% complete and ready, you'd better still publish it and submit the evaluation, because otherwise google will fail you and that'd be too sad.
If you'd like you can publish the google-way only the unique-constraint part without further optimizer work. Or at least please mention that you'd completed the original project and went working on extensions. I mean, it's better than saying "the code is not 100% complete" :)
Okay I am thinking of writing a blog post with a link to my github repository. Blog Link <http://sachin1001gsoc.blogspot.in/2016/08/gsoc-2016.html> Please check this.
I think that'll do, yes.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Hi, Sachin! On Aug 22, Sachin Setia wrote:
Hi Sergei!
Actually I completed the work on update and delete. Now they will use index for looking up records.
But I am thinking I have done a lot of changes in optimizer which may break it, and also there are lots of queries where my code does not work, fixing this might take a long amount of time. I am thinking of a change in my existing code :- Suppose a table t1 create table t1 (a blob, b blob, c blob, unique(a,b,c)); In current code, for query like there will a KEY with only one keypart which points to field DB_ROW_HASH_1. It was okay for normal updates, insert and delete, but in the case of where optimization I have do a lot of stuff, first to match field (like in add_key_part), then see whether all the fields in hash_str are present in where or not, then create keys by calculating hash. I do this by checking the HA_UNIQUE_HASH flag in KEY, but this also makes (I think) optimizer code bad because of too much dependence. Also I need to patch get_mm_parts and get_mm_leaf function, which I think should not be patched.
Later today I'll know exactly what you mean, when I'll finish reviewing your optimizer changes. But for now, let's say I agree on a general principle :) Optimizer is kinda complex and fragile, so it's good to avoid doing many changes in it - the effect might be difficult to predict.
I am thinking of a another approach to this problem at server level instead of having just one keypart we can have 1+3 keypart. Last three keyparts will be for field a, b, c and first one for DB_ROW_HASH_1.These will be only at server level not at storage level. key_info->key_part will point at keypart containing field a, while key_part having field DB_ROW_HASH_1 will -1 index. By this way I do not have to patch more of optimizer code. But there is one problem, what should be the length of key_part? I am thinking of it equal to field->pack_length(), this would not work because while creating keys optimizer calls get_key_image() (which is real data so can exceed pack_lenght() in case of blob), so to get this work I have to patch optimizer where it calls get_key_image() and see if key is HA_UNIQUE_HASH. If yes then instead of get_key_image just use memcpy(key, field->ptr(), field->pack_length()); this wont copy the actual data, but we do not need actual data. I will patch handler methods like ha_index_read, ha_index_idx_read, multi_range_read_info_const basically handler methods which are related to index or range search. In these methods i need to calculate hash, which I can calculate from key_ptr but key_ptr doe not have actual data(in case of blobs etc).So to get the data for hash, I will make a field clone of (a,b,c etc) but there ptr will point in key_ptr. Then field->val_str() method will work simply and i can calculate hash. And also I can compare returned result with actual key in handler method itself. What do you think of this approach ?
Looks simpler, agree. The length of the keypart should not matter, because it should never be used. May be it would be good to set it to -1 as it might help to catch errors (where it is erroneously used). I didn't understand why you need to clone fields though :( Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Hi Sergei! On Tue, Aug 23, 2016 at 4:35 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On Aug 22, Sachin Setia wrote:
Hi Sergei!
Actually I completed the work on update and delete. Now they will use index for looking up records.
But I am thinking I have done a lot of changes in optimizer which may break it, and also there are lots of queries where my code does not work, fixing this might take a long amount of time. I am thinking of a change in my existing code :- Suppose a table t1 create table t1 (a blob, b blob, c blob, unique(a,b,c)); In current code, for query like there will a KEY with only one keypart which points to field DB_ROW_HASH_1. It was okay for normal updates, insert and delete, but in the case of where optimization I have do a lot of stuff, first to match field (like in add_key_part), then see whether all the fields in hash_str are present in where or not, then create keys by calculating hash. I do this by checking the HA_UNIQUE_HASH flag in KEY, but this also makes (I think) optimizer code bad because of too much dependence. Also I need to patch get_mm_parts and get_mm_leaf function, which I think should not be patched.
Later today I'll know exactly what you mean, when I'll finish reviewing your optimizer changes.
But for now, let's say I agree on a general principle :) Optimizer is kinda complex and fragile, so it's good to avoid doing many changes in it - the effect might be difficult to predict.
I am thinking of a another approach to this problem at server level instead of having just one keypart we can have 1+3 keypart. Last three keyparts will be for field a, b, c and first one for DB_ROW_HASH_1.These will be only at server level not at storage level. key_info->key_part will point at keypart containing field a, while key_part having field DB_ROW_HASH_1 will -1 index. By this way I do not have to patch more of optimizer code. But there is one problem, what should be the length of key_part? I am thinking of it equal to field->pack_length(), this would not work because while creating keys optimizer calls get_key_image() (which is real data so can exceed pack_lenght() in case of blob), so to get this work I have to patch optimizer where it calls get_key_image() and see if key is HA_UNIQUE_HASH. If yes then instead of get_key_image just use memcpy(key, field->ptr(), field->pack_length()); this wont copy the actual data, but we do not need actual data. I will patch handler methods like ha_index_read, ha_index_idx_read, multi_range_read_info_const basically handler methods which are related to index or range search. In these methods i need to calculate hash, which I can calculate from key_ptr but key_ptr doe not have actual data(in case of blobs etc).So to get the data for hash, I will make a field clone of (a,b,c etc) but there ptr will point in key_ptr. Then field->val_str() method will work simply and i can calculate hash. And also I can compare returned result with actual key in handler method itself. What do you think of this approach ?
Looks simpler, agree. The length of the keypart should not matter, because it should never be used. May be it would be good to set it to -1 as it might help to catch errors (where it is erroneously used).
I think it should , because we make buffer size according to key_ptr->length. For example , this code at test_quick_select (param.min_key= (uchar*)alloc_root(&alloc,max_key_len) here max_key_length is sum of lengths of all key_part->store_length and also in function get_mm_leaf we use field->get_key_image(str+maybe_null, key_part->length, key_part->image_type); So I think length will matter.
I didn't understand why you need to clone fields though :(
Ohh , this is actually just for reading data from key_ptr, because key_ptr does not have direct data , so i thought that i have to make a clone of field and then set its ptr. But I guess this function would work. inline void move_field(uchar *ptr_arg,uchar *null_ptr_arg,uchar null_bit_arg) { ptr=ptr_arg; null_ptr=null_ptr_arg; null_bit=null_bit_arg; }
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Regards sachin

Hi, Sachin! On Aug 23, Sachin Setia wrote:
Looks simpler, agree. The length of the keypart should not matter, because it should never be used. May be it would be good to set it to -1 as it might help to catch errors (where it is erroneously used).
I think it should , because we make buffer size according to key_ptr->length. For example , this code at test_quick_select
(param.min_key= (uchar*)alloc_root(&alloc,max_key_len)
here max_key_length is sum of lengths of all key_part->store_length and also in function get_mm_leaf we use
field->get_key_image(str+maybe_null, key_part->length, key_part->image_type);
So I think length will matter.
For normal keys, yes. But for your HA_UNIQUE_HASH keys the buffer size is not the sum of key_part lengths. So using key_part->length to calculate the buffer size for HA_UNIQUE_HASH is wrong, I'd say. Regards, Sergei Chief Architect MariaDB and security@mariadb.org
participants (2)
-
Sachin Setia
-
Sergei Golubchik