Hi Sergei!
Actually I completed the work on update and delete. Now they will use index
for looking up records.
But I am thinking I have done a lot of changes in optimizer which may break
it , and also there are lots of queries where my
code does not work, fixing this might take a long amount of time.
I am thinking of a change in my existing code :-
Suppose a table t1
create table t1 (a blob, b blob, c blob, unique(a,b,c));
In current code , for query like there will a KEY with only one keypart
which points to field DB_ROW_HASH_1.
It was okay for normal updates , insert and delete , but in the case of
where optimization I have do a lot of stuff , first to match field (like
in add_key_part), then see whether all the fields in hash_str are present
in where or not, then create keys by calculating hash. I do this by
checking the HA_UNIQUE_HASH flag in KEY , but this also makes (I think)
optimizer code
bad because of too much dependence. Also I need to patch get_mm_parts and
get_mm_leaf function , which I think
should not be patched.
I am thinking of a another approach to this problem at server level instead
of having just one keypart we can have 1+3
keypart. Last three keypart will be for field a, b, c and first one for
DB_ROW_HASH_1 .These will be only at server level not at
storage level. key_info->key_part will point at keypart containing field a
, while key_part having field DB_ROW_HASH_1 will
-1 index. By this way I do not have to patch more of optimizer code. But
there is one problem , what should be the length of
key_part? I am thinking of it equal to field->pack_length(), this would not
work because while creating keys optimizer
calls get_key_image() (which is real data so can exceed pack_lenght() in
case of blob), so to get this work I have to patch
optimizer where it calls get_key_image() and see if key is HA_UNIQUE_HASH
. If yes then instead of get_key_image just use
memcpy(key, field->ptr(),
field->pack_length());
this wont copy the actual data, but we do not need actual data. I will
patch handler methods like ha_index_read, ha_index_idx_read ,
multi_range_read_info_const
basically handler methods which are related to index or range search.
In these methods i need to calculate hash , which I can calculate from
key_ptr but key_ptr doe not have actual data(in case
of blobs etc).So to get the date for hash , I will make a field clone of
(a,b,c etc) but there ptr will point in key_ptr. Then
field->val_str() method will work simply and i can calculate hash. And also
I can compare returned result with actual key in
handler method itself.
What do you think of this approach ?
Regards
sachin
On Sat, Aug 20, 2016 at 11:16 PM, Sergei Golubchik
Hi, Sachin!
On Aug 19, Sachin Setia wrote:
On Fri, Aug 19, 2016 at 2:42 PM, Sergei Golubchik
wrote: First. I believe you'll need to do your final evaluation soon, and it will need to have a link to the code. Did you check google guidelines about it? Is everything clear there? Do you need help publishing your work in a format that google requires?
They don't accept delays for any reasons, so even if your code is not 100% complete and ready, you'd better still publish it and submit the evaluation, because otherwise google will fail you and that'd be too sad.
If you'd like you can publish the google-way only the unique-constraint part without further optimizer work. Or at least please mention that you'd completed the original project and went working on extensions. I mean, it's better than saying "the code is not 100% complete" :)
Okay I am thinking of writing a blog post with a link to my github repository. Blog Link http://sachin1001gsoc.blogspot.in/2016/08/gsoc-2016.html Please check this.
I think that'll do, yes.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org